Jump to content

Wikipedia:WikiProject Disambiguation/Database dump analysis

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Wangi (talk | contribs) at 15:01, 25 January 2006 (sp). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A database dump is a backup of all Wikipedia pages, which can then be downloaded. Once downloaded, extensive analysis can performed on the dump (this can't be done by scraping live from the servers becasue it creates excessive load).

Database dump analysis can help WikiProject Disambiguation achieve its goals by providing editors with extra information.

Currently run dump analyses

2005-11-13 2005-12-13
pages links
articles 32166 410987
templates 936 1207
Σ 33102 412194
pages links
articles 34126 425120
templates 349 400
Σ 34475 425520

Proposal: tracking down dab pages with suspect style

At WP:DAB wangi expressed interest in using the dumps to aid dab page style (by tracking down suspect dab pages). One could argue that Category:Disambiguation pages in need of cleanup is always plentifully stocked and that a dump analysis to find more troublesome dabs is unnecessary. But then again, who could have perceived the activity around From templates that resulted in completion of that report.

Ideas

Image and template checks...

Dab pages are checked for:

  • Images
  • Templates (other than dab templates naturally, including stubs templates etc)
Images and templates indicate that a dab page is verging on article status. An expert can examine the dab and perform merging, start discussion etc.
Talk page is a redirect?
  • If a page has a dab template then it should have its own talk page. Due to page moves, often a dab's talk page redirects elsewhere (no redirect should be present). A listing of dab pages without their own talk pages would be helpful.
Link checking...
  • Check the ratio of wikilinks to number of lines for page. The idea being that the higher the value the more in need of cleanup a page is (generally).
  • Check for piping of links. Generally piping should not be present on dab pages. Perhaps check the gross number of piped links