Wikipedia:Bots/Requests for approval/KrimpBot
Automatic or Manually Assisted: Automatic, unsupervised
Programming Language(s): Python, using the pywikipedia framework
Function Summary: Will tag the talk pages of open Tor exit nodes indicating their open status, and tag blocked non-nodes, with relevant categories
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Edit rate requested: 4 edits per minute
Already has a bot flag (Y/N): N
Function Details: http://hemlock.ts.wikimedia.org/~krimpet/torlist.txt is an automatically generated list of Tor nodes exiting to the WMF servers; I have a small program that runs every 6 hours via cron job that queries the authoritative directory, filters out nodes whose exit policy blocks access to the WMF IP ranges (as well as restrictive exit policies like *:80, *:443, and *:*), and writes them to this file.
I would like to propose a bot, written in Python with pywikipedia and also running on my toolserver account, that uses this list to identify active Tor exits on their talk pages, as well as identify IPs that are no longer Tor but still blocked, so that administrators can block/unblock if needed. It would do this simply by tagging IP talk pages with an appropriate category (perhaps Category:Tor exit nodes and Category:Blocked former Tor exit nodes?) and removing the category when it finds no longer applies. (I also hope to make this bot portable across WMF wikis as well, in case other projects want to use it.)
Discussion
- Comment. This seems like a very useful function that would help bring a healthy share each of order and accuracy to an often muddled area of wiki administration. Vassyana (talk) 06:56, 20 February 2008 (UTC)
- Comment - as part of checkuser duties, I check and block tor nodes all the time. This function would be invaluable - Alison ❤ 07:03, 20 February 2008 (UTC)
- Comment - This is indeed useful Krimpet. Compwhiz II(Talk)(Contribs) 11:42, 20 February 2008 (UTC)
- Which templates will it use for blocked nodes? Unblocked nodes? This would be really helpful :) SQLQuery me! 12:06, 20 February 2008 (UTC)
- I foresee it using user categories, rather then templates. For example, current nodes would be tagged with Category:Tor exit nodes. Every few hours as the list is updated, it would compare the members of that category in the live list, and remove the category from IPs that are no longer Tor; additionally, if someone had blocked the IP as Tor, it would tag it with Category:Blocked former Tor exit nodes. krimpet✽ 19:55, 20 February 2008 (UTC)
- Thanks :) I misread, thought I saw something about templates. Those cat's sound about right, and, wouldn't interfere (would help nicely!) with my similar efforts. SQLQuery me! 01:46, 21 February 2008 (UTC)
- I foresee it using user categories, rather then templates. For example, current nodes would be tagged with Category:Tor exit nodes. Every few hours as the list is updated, it would compare the members of that category in the live list, and remove the category from IPs that are no longer Tor; additionally, if someone had blocked the IP as Tor, it would tag it with Category:Blocked former Tor exit nodes. krimpet✽ 19:55, 20 February 2008 (UTC)
- I see no point is blocking Tor nodes with no edits to wikipedia. It's like randomly killing someone because they *might* be a threat later on. Mønobi 03:39, 21 February 2008 (UTC)
- This bot is not designed to block Tor nodes -- there have been proposals to do so in the past with adminbots, but that is not what this bot is intended to do. Rather, it allows users and admins to clearly identify which IPs truly are and aren't Tor, to eliminate the current patchy system of guesswork that we have currently, and in the event of abuse know where it's coming from. krimpet✽ 04:05, 21 February 2008 (UTC)
- I know, but certainly you'll tag tor nodes that have made zero edits to wikipedia. Mønobi 22:14, 21 February 2008 (UTC)
- That is a good point, is there a specific benefit to tagging talkpages of accounts, that haven't ever edited? Is that something you'd be willing to work into the code? I mean, you're running constantly anyhow, there's no real need or harm, that I can see, to hold off, until that IP has actually edited. SQLQuery me! 03:04, 22 February 2008 (UTC)
- Only tagging IPs with edits greatly lessens its utility for checkusers, though: if someone abuses the node while logged in but doesn't edit anonymously, they wouldn't be able to easily check an IP's talk to see if it's an exit node or not. krimpet✽ 03:58, 22 February 2008 (UTC)
- For some perspective, we've had a seriously abusive banned editor using TOR to register a bunch of accounts to get past the ACB rangeblocks that have been set up. A system where we can tag and check TOR would be seriously useful in dealing with this guy considering those accounts have basically made "zero edits" when you look at their contribs - Alison ❤ 04:08, 22 February 2008 (UTC)
- Click 'What links here', on the IP's userpage :) A couple/few of us maintain up-to-date lists of TOR nodes that allow Wikipedia exit, without tagging the IP's talkpage. I'm not opposed to doing this, for the record, by the way. SQLQuery me! 04:19, 22 February 2008 (UTC)
- The talk page does have some benefits to a list though: one, it means the talk page will be User:Krimpet/Fake link, standing out in CU results; and two, the talk page history will make it easy to see the the IP's history as a node, as opposed to a list where all the bot's updates would be clumped together. krimpet✽ 04:24, 22 February 2008 (UTC)
- Agreed, that'd make it a lot easier to track the 'flip-floppers'. SQLQuery me! 04:27, 22 February 2008 (UTC)
- The talk page does have some benefits to a list though: one, it means the talk page will be User:Krimpet/Fake link, standing out in CU results; and two, the talk page history will make it easy to see the the IP's history as a node, as opposed to a list where all the bot's updates would be clumped together. krimpet✽ 04:24, 22 February 2008 (UTC)
- Click 'What links here', on the IP's userpage :) A couple/few of us maintain up-to-date lists of TOR nodes that allow Wikipedia exit, without tagging the IP's talkpage. I'm not opposed to doing this, for the record, by the way. SQLQuery me! 04:19, 22 February 2008 (UTC)
- That is a good point, is there a specific benefit to tagging talkpages of accounts, that haven't ever edited? Is that something you'd be willing to work into the code? I mean, you're running constantly anyhow, there's no real need or harm, that I can see, to hold off, until that IP has actually edited. SQLQuery me! 03:04, 22 February 2008 (UTC)
- Well, either way, I'd love to see a trial. What's everyone else think? I was thinking maybe 3 days. (I'm not in BAG, so, please note, that I cannot technically approve trials) SQLQuery me! 04:32, 22 February 2008 (UTC)
- Indeed, sounds interesting.
Approved for trial (250 edits or 7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. - 7 Day/250 edits sounds more than reasonable. Its a fairly low risk bot.. Presumably the code is still to be written? Whether it needs writing or not, can you please post a link to a copy of the code (if that is ok with you), so other users can look over it as necessary? —Reedy Boy 15:58, 25 February 2008 (UTC)
- Ah, thanks, I've got it up and running now :) The code is here, and I welcome any suggestions or improvements. krimpet✽ 06:49, 26 February 2008 (UTC)
- Quite frankly, your code is a waste of system resources (I mean in regards to the TS). It should not be in a
while
loop, you should really use crontab. I've cleaned it up a bit to include expections. You'll have to place it on the crontab. Here is the source:
- Quite frankly, your code is a waste of system resources (I mean in regards to the TS). It should not be in a
- Ah, thanks, I've got it up and running now :) The code is here, and I welcome any suggestions or improvements. krimpet✽ 06:49, 26 February 2008 (UTC)
- Indeed, sounds interesting.
import time, wikipedia, catlib, pagegenerators, welcome
import re
def main():
site = wikipedia.getSite()
torfile = '/home/krimpet/public_html/torlist.txt'
torcat = u'Tor exit nodes'
formercat = u'Blocked former Tor exit nodes'
runevery = 3600 * 6 # 6 hours
cat = catlib.Category(site, u'Category:%s' % (torcat))
gen = pagegenerators.CategorizedPageGenerator(cat)
runtime = time.time()
torlist = open(torfile,'r').readlines()
for line in torlist:
if line[0] != '#' and len(line.strip()) > 0:
ip = line.strip()
page = wikipedia.Page(site, u"User talk:%s" % (ip))
cats = page.categories()
istor = False
for cat in cats:
if cat.titleWithoutNamespace() == torcat:
istor = True
if not istor:
wikipedia.output(u"tagging %s" % (ip))
text = u'\n[[Category:%s]]' % (torcat)
try:
text = page.get(get_redirect = True)
if text.find(formercat) != -1:
text = text.replace(formercat, torcat)
else:
text += u'\n[[Category:%s]]' % (torcat)
except wikipedia.NoPage: pass
page.put(text, u"Tagging %s as a currently open Tor node" % (ip))
else:
wikipedia.output(u"%s is already tagged" % (ip))
for page in gen:
ip = page.titleWithoutNamespace()
hit = False
for line in torlist:
if line.strip() == ip:
hit = True
if not hit:
try:
text = page.get(get_redirect = True)
if site.getUrl(u'/w/api.php?action=query&format=xml&list=blocks&bkusers=%s' % (ip)).find(ip) != -1:
print 'retagging %s as blocked former exit' % (ip)
text = text.replace(torcat, formercat)
page.put(text, u"Tagging %s as a blocked former Tor node" % (ip))
else:
wikipedia.output(u"untagging %s" % (ip))
text = text.replace(u'\n[[Category:%s]]' % (torcat), u'')
text = text.replace(u'[[Category:%s]]\n' % (torcat), u'')
text = text.replace(u'[[Category:%s]]' % (torcat), u'')
page.put(text, u"Untagging %s as no longer an open Tor node" % (ip))
except wikipedia.Error:
wikipedia.output(u"Error on [[%s]], skipping" % page.title())
continue
else:
wikipedia.output(u"%s is still open" % (ip))
cat = catlib.Category(site, u'Category:%s' % formercat)
gen = pagegenerators.CategorizedPageGenerator(cat)
for page in gen:
ip = page.titleWithoutNamespace()
if site.getUrl(u'/w/api.php?action=query&format=xml&list=blocks&bkusers=%s' % (ip)).find(ip) == -1:
wikipedia.output(u"untagging %s" % (ip))
text = text.replace(u'\n[[Category:%s]]' % (formercat), u'')
text = text.replace(u'[[Category:%s]]\n' % (formercat), u'')
text = text.replace(u'[[Category:%s]]' % (formercat), u'')
try:
page.put(text, u"Untagging %s as no longer a blocked former Tor node" % (ip))
except wikipedia.NoPage:
text = ''
continue
except wikipedia.Error:
wikipedia.output(u"Error while saving [[%s]]" % page.title())
continue
if __name__ == '__main__':
try:
main()
finally:
wikipedia.stopme()
That should work. It worked for me, at least. You should also use regexes for the replacements, but I'm not too familiar with them, so I didn't include it myself. Mønobi 23:27, 27 February 2008 (UTC)