Wikipedia:Bots/Requests for approval/CopyvioHelperBot
Operator: Chris is me
Automatic or Manually Assisted: Entirely manual
Programming Language(s):perl
Function Summary: Finds in-article copyvios and notifies operator.
Edit period(s) (e.g. Continuous, daily, one time run): No actual editing, any edits under the account are done by the operator at the request of the script.
Edit rate requested: 1 edit per minute
Already has a bot flag (Y/N):No
Function Details: The script Googles the first 15 words of each paragrah and lists any matching URLs, excluding a whitelist of mirrors. The operator then checks to see which direction the copyvio is (if it is one at all) and makes approriate changes.
Discussion
Where does it get the list of pages to check for? Does it just iterate through all pages here? Thats a ton of requests. Also, WP:CP tends to backlogged already. On the other hand, it would be much more server hoggish that my second bot, and our caching and slaves do go a far way, and the data could be useful. Still, it would be nice if the bot only looked at non-stub smaller pages (the ones that aren't so active, which tend to have lifted text) or maybe focused more on patrolling new pages (from the log perhaps?) in real time for incoming copyright vios. Voice-of-All 18:24, 20 December 2006 (UTC)
- Sorry. Each time I run it, it checks one page (found with Special:Random). I then delete the problem sections (or if it's the whole article, then I {{db-copyvio}} it). -- Chris is me 18:34, 20 December 2006 (UTC)
- There are over 1.5M artcles. How many of them contain blatant copyvio that can be detected by your bot? Probably, the bot should check mostly Special:Newpages and "problem" categories such as WP:WFY WP:CBM CAT:NOCAT? Or it'll process them quite easily and you'll have a time to check random pages? MaxSem 20:43, 21 December 2006 (UTC)
- You'd be surprised how many in-article copyvios there are. I could make it scan newpages, but (1) I don't know very much perl (I didn't write the thing) and (2) It's not automatic, I just run it when I feel like and if there's a copyvio, I remove it. Wait, I need a bot flag for that? 66.82.9.80 04:08, 22 December 2006 (UTC) This post was made by -- Chris is me (user/review/talk) when he was unable to log in
- I've been running the scanner sopradically and now currently average something like 1 copyvio for 15 articles. This is bad, weel bad. -- Chris is me 04:18, 26 December 2006 (UTC)
- You'd be surprised how many in-article copyvios there are. I could make it scan newpages, but (1) I don't know very much perl (I didn't write the thing) and (2) It's not automatic, I just run it when I feel like and if there's a copyvio, I remove it. Wait, I need a bot flag for that? 66.82.9.80 04:08, 22 December 2006 (UTC) This post was made by -- Chris is me (user/review/talk) when he was unable to log in
- There are over 1.5M artcles. How many of them contain blatant copyvio that can be detected by your bot? Probably, the bot should check mostly Special:Newpages and "problem" categories such as WP:WFY WP:CBM CAT:NOCAT? Or it'll process them quite easily and you'll have a time to check random pages? MaxSem 20:43, 21 December 2006 (UTC)