Jump to content

User talk:DumZiBoT/reflinks.py

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

http://en.wikipedia.org/w/index.php?title=User:DumZiBoT/reflinks.py&curid=15117767&diff=202109411&oldid=195635129

I see a lot of problems with that change.

  1. Why did you remove the support of named references ? With your current version, a named bare references will be transformed into an un-named ref.
  2. Removing the "finally" part is wrong. Read this : Finally allows us to execute a code, whatever happens in the try block
  3. The hack for de:Humane_Papillomviren is non-sensical. The whole point of the code is to detect from meta-tags a proper encoding, so that UnicodeDammit is able to decode properly the text ( u = UnicodeDammit(linkedpagetext, overrideEncodings = enc). I don't really see how converting the html source to some potentially wrong charset could help finding the good charset to properly decode the page...
  4. re.sub(r"(\[\w+://[^][<>\"\s]*?)''", r"\1 ''", new_text) Seems very strange to me. Are you sure this is not re.sub(r"(\[\w+://[^\]\[<>\"\s]*?)''", r"\1 ''", new_text) ?

Cheers, NicDumZ ~ 19:40, 30 March 2008 (UTC)[reply]

I'll have to admit that this was rather rushed and probably not tested well enough. But its what I got after merge the toolserver changes with you last changes. And because needed to copy part of the code to AWB.
  1. I wanted to change so .group() excludes <ref> from the match. But it doesn't work... I'm going to have to look at that code again.
  2. As the documentation said for python >= 2.5, python 2.4.5 is running on the toolserver
  3. A Unicode error is raised as regex is preformed. I've since revised the toolserver copy to preform the search using the string from unicode(linkedpagetext, 'ascii', errors='ignore') as we're just dealing with ascii-based regular HTML code.
  4. Odd as it may seem, it works the same because [^] is an invalid. Tested using the regex engines in C# and python.
Dispenser 05:14, 31 March 2008 (UTC)[reply]