Wikipedia:Bots/Requests for approval/CopyvioHelperBot

CopyvioHelperBot

Automatic or Manually Assisted: Entirely manual

Programming Language(s):perl

Function Summary: Finds in-article copyvios and notifies operator.

Edit period(s) (e.g. Continuous, daily, one time run): No actual editing, any edits under the account are done by the operator at the request of the script.

Edit rate requested: 1 edit per minute

Already has a bot flag (Y/N):No

Function Details: The script Googles the first 15 words of each paragrah and lists any matching URLs, excluding a whitelist of mirrors. The operator then checks to see which direction the copyvio is (if it is one at all) and makes approriate changes.

Discussion

Where does it get the list of pages to check for? Does it just iterate through all pages here? Thats a ton of requests. Also, WP:CP tends to backlogged already. On the other hand, it would be much more server hoggish that my second bot, and our caching and slaves do go a far way, and the data could be useful. Still, it would be nice if the bot only looked at non-stub smaller pages (the ones that aren't so active, which tend to have lifted text) or maybe focused more on patrolling new pages (from the log perhaps?) in real time for incoming copyright vios. Voice-of-All 18:24, 20 December 2006 (UTC)[reply]

Sorry. Each time I run it, it checks one page (found with Special:Random). I then delete the problem sections (or if it's the whole article, then I {{db-copyvio}} it). -- Chris is me 18:34, 20 December 2006 (UTC)[reply]

There are over 1.5M artcles. How many of them contain blatant copyvio that can be detected by your bot? Probably, the bot should check mostly Special:Newpages and "problem" categories such as WP:WFY WP:CBM CAT:NOCAT? Or it'll process them quite easily and you'll have a time to check random pages? MaxSem 20:43, 21 December 2006 (UTC)[reply]