Wikipedia:Bots/Requests for approval/CopyvioHelperBot

CopyvioHelperBot

Automatic or Manually Assisted: Entirely manual

Programming Language(s):perl

Function Summary: Finds in-article copyvios and notifies operator.

Edit period(s) (e.g. Continuous, daily, one time run): No actual editing, any edits under the account are done by the operator at the request of the script.

Edit rate requested: 1 edit per minute

Already has a bot flag (Y/N):No

Function Details: The script Googles the first 15 words of each paragrah and lists any matching URLs, excluding a whitelist of mirrors. The operator then checks to see which direction the copyvio is (if it is one at all) and makes approriate changes.

Discussion

Where does it get the list of pages to check for? Does it just iterate through all pages here? Thats a ton of requests. Also, WP:CP tends to backlogged already. On the other hand, it would be much more server hoggish that my second bot, and our caching and slaves do go a far way, and the data could be useful. Still, it would be nice if the bot only looked at non-stub smaller pages (the ones that aren't so active, which tend to have lifted text) or maybe focused more on patrolling new pages (from the log perhaps?) in real time for incoming copyright vios. Voice-of-All 18:24, 20 December 2006 (UTC)[reply]

Sorry. Each time I run it, it checks one page (found with Special:Random). I then delete the problem sections (or if it's the whole article, then I {{db-copyvio}} it). -- Chris is me 18:34, 20 December 2006 (UTC)[reply]

There are over 1.5M artcles. How many of them contain blatant copyvio that can be detected by your bot? Probably, the bot should check mostly Special:Newpages and "problem" categories such as WP:WFY WP:CBM CAT:NOCAT? Or it'll process them quite easily and you'll have a time to check random pages? MaxSem 20:43, 21 December 2006 (UTC)[reply]

You'd be surprised how many in-article copyvios there are. I could make it scan newpages, but (1) I don't know very much perl (I didn't write the thing) and (2) It's not automatic, I just run it when I feel like and if there's a copyvio, I remove it. Wait, I need a bot flag for that? 66.82.9.80 04:08, 22 December 2006 (UTC) _{This post was made by -- Chris is me _{(user/review/talk)} when he was unable to log in}[reply]

I've been running the scanner sopradically and now currently average something like 1 copyvio for 15 articles. This is bad, weel bad. -- Chris is me 04:18, 26 December 2006 (UTC)[reply]

Be useful to have some results to review. Also you may like to consider donwloading a database dump and driving your bot off that, then server issues vanish. Rich Farmbrough, 22:45 28 December 2006 (GMT).

Indeed, either just check newpages or download a dump and iterate (not random) through allpages. Voice-of-All 04:26, 31 December 2006 (UTC)[reply]

I agree with voice of all on what to use this bot for. I actually don't think you need bot approval if you're manually doing and monitoring every edit anyway. If you're not sure how to modify your bot to automatically run in the way people are referring to you could ask for some help on Wikipedia:Bot requests. Vicarious 03:27, 1 January 2007 (UTC)[reply]

What about situations where say, a reporter for a major newspaper copies a wiki article without giving credit? Wouldn't this bot, even user assisted, have a high chance of marking the article in that case? --Measure 21:14, 9 January 2007 (UTC)[reply]

Nevermind. I see my misunderstanding comes from a poor reading of how this bot would be used. --Measure 18:34, 10 January 2007 (UTC)[reply]