Jump to content

Wikipedia talk:WikiProject Deletion sorting/Accuracy reports

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Accuracy report

I had a little extra time today, so I went through the full Aug. 20 output and checked for errors. I found that the program had sorted correctly on about 110 occasions, and incorrectly in about 20 (give or take a few). This is very promising, but it also means there are significant hurdles to cross before the process can be fully automated (I would say a 10% error rate is the most that we could possibly accept, and we're currently at about 19%).

Most of the errors are due either to:

  • shakily-defined sortpages (for instance, I counted it as an error when it sorted a book under "Writing" as well as "Publications"), or to
  • keywords with multiple meanings (such as a professor at Washington University being sorted into "Washington").

The first we can deal with fairly easily (see above); the second requires either a) an incalculable amount of fine-tuning, or b) a much more sophisticated approach. I'm pinning my hopes on b), and am working on a corpus of sorted stubs for automated keyword extraction; however, I'm also working on a) as time permits.

Suggestions for improvements to the approach are welcome. -- Visviva 13:07, 21 August 2006 (UTC)[reply]

Update: Progress! Following some tweaks to the searching routine and sortpage structure, I counted 116 reasonable placements and 15 unreasonable ones today (from the Aug 21 AfDs.). That's 12.9%, which is almost up to the threshold. 4 of the errors came from the eternally problematic "Lists" and "Words" sortpages -- if we left those out, we'd already have an error rate of under 10%.
A total of 8 errors were due to egregious flaws in the Wikipedia category structure, but such problems are probably inevitable. -- Visviva 12:17, 22 August 2006 (UTC)[reply]
The last two days have had error rates below 10%, though at the cost of reduced output (nearly half of the AfDs have gone unsorted). -- Visviva 17:42, 24 August 2006 (UTC)[reply]
Update Aug 26: Various improvements to the code which I won't burden this page with... The Aug 25 sort had (by my count) about 178 correct sortings, and 15 clearly incorrect sortings; 38 pages were left unsorted. That's better than 90% accuracy and around 75% overall coverage -- not too shabby. Now I just need to actually sort them all. Where's a bot when you need one... :-)
This program is only as smart as its keyword set. At present the keyword set is entirely hand-built, and accordingly clunky and incomplete. I'm hoping to bring in data from my corpus of stubs soon -- that should allow substantial improvements in accuracy and coverage.
(I should add that I'm counting a sorting as accurate if it falls under the rubric of "X-related" deletions -- in other words, a web applications company is website-related, although not actually a website. This is in line with Template:Deletionlist.) -- Visviva 03:22, 26 August 2006 (UTC)[reply]
Update Aug 27: Without any ad-hoc changes, the sort of yesterday's AfD's showed only 11 clear errors out of 158 sorting decisions, for an accuracy of about 93%. Coverage was 97 out of 129, or about 75%. -- Visviva 05:54, 27 August 2006 (UTC)[reply]
August 28: With some ongoing tweaks to the keyword set and the code, the final tally was 5 errors (maybe I'm being too generous?) out of 170 decisions (~97%). Coverage was 99 out of 118 (~84%). However, the original run with buggy code and untested keywords had about 5 more errors (~93% acc.) and 5 fewer inclusions (~80%) ... time will tell which is closer to the mark. -- Visviva 06:12, 28 August 2006 (UTC)[reply]
August 29: (Aug. 28th AfDs) The initial run gives about 11 errors out of 161 decisions, and 102 inclusions out of 138 articles scanned -- about 93% accuracy and 74% inclusion. Despite continuing tweaks to the code, we seem to be facing something of a plateau here. Errors continue to be concentrated in the larger and messier topical lists, especially Business, Sexuality, and Military (these three alone account for 6 of the 11 errors). -- Visviva 08:05, 29 August 2006 (UTC)[reply]