Wikipedia:Bots/Requests for approval/DustyBot
Automatic or Manually Assisted: Automatic
Programming Language(s): PHP
Function Summary: Update WP:DUSTY
Edit period(s) (e.g. Continuous, daily, one time run): Daily
Already has a bot flag (Y/N): N
Function Details: The list of dusty pages, linked from SpecialPages, is several months out of date. The list of should be regenerated when people update the pages. Practically, this could be done once per day. DustyBot will do this in two stages. The first stage will generate a list of ~10,000 dusty pages from the most recent database dump. This requires tens of thousands of page accesses to search for and disregard disambiguation pages. Fortunately, this only needs to be done when a new database dump is available, which happens once every couple of months. The list will be built over the course of several days, keeping page accesses below 10/min. The second stage will scan this list once per day for the 100 pages that are still dusty, and post that at Wikipedia:Dusty articles. Because this bot will only edit Wikipedia once per day, and will only change one hard coded page, the risk of interfering with other editors is low. I am interested in hearing ideas about how to reduce the number of page accesses.
Discussion
Which db dump are you using? If the dump doesn't contain the page text you should be able to generate lists of disambig pages from the templates at MediaWiki:Disambiguationspage using the API or ask someone with toolserver access to do a query. Also I'm somewhat confused, it "will generate a list of ~10,000 dusty pages" then scan "for the 100 pages that are still dusty." What will it actually be reporting on Wikipedia? Mr.Z-man 07:24, 13 October 2008 (UTC)
- I'm using page.sql.gz from the 10/08 dump. I could process pages-articles.xml.bz2 instead, which would eliminate the need to check individual pages, but that would mean downloading a 4 GB file instead of tens or hundreds of MB. The list of 10,000 potentially dusty pages is just the first stage and is not posted to Wikipedia. The pages on that list are either really dusty or have been very recently edited. The second stage goes through that list, weeding out the recently updated ones, until it has a list of 100 pages. That list of checked pages will be posted to Wikipedia. Wronkiew (talk) 15:59, 13 October 2008 (UTC)