Zum Inhalt springen

Benutzer Diskussion:Stefan Kühn/Check Wikipedia

Seiteninhalte werden in anderen Sprachen nicht unterstützt.
aus Wikipedia, der freien Enzyklopädie
Dies ist eine alte Version dieser Seite, zuletzt bearbeitet am 10. September 2009 um 22:04 Uhr durch Archimëa (Diskussion | Beiträge) (Table max width (new interface)). Sie kann sich erheblich von der aktuellen Version unterscheiden.

Letzter Kommentar: vor 15 Jahren von Stefan Kühn in Abschnitt Table max width (new interface)
Archiv
2008

2009

Wie wird ein Archiv angelegt?

Error #003 in Turkish Wikipedia

Good evening. Could you please modify the script so that it not only searches for <references /> but also {{reflist}}? The reason why this is required is that the output shows the valid pages (with reference tags) as if they do not have any reference tags. Thanks! ----Superyetkinileti 20:07, 31 May 2009 (UTC)

Hello Superyetkin, this is already feature of the script to search for "reflist". Can you tell me the article where the script don't found the reflist-template. I think there is an other problem. -- sk 17:48, 1. Jun. 2009 (CEST)Beantworten
Hello there, sorry for the late reply.
The problem is that this template, which is used in featured articles, is not recognized by the script and these articles come up with "missing <references />" errors on the project page. You can see the current situation here Could you please examine the issue and resolve it? Thanks for your help. --Superyetkin 21:10, 9. Jul. 2009 (CEST)Beantworten
 Ok, I have insert the "Şablon:Kayan kaynakça". -- sk 21:55, 3. Aug. 2009 (CEST)Beantworten
Thanks! --Superyetkin 10:03, 4. Aug. 2009 (CEST)Beantworten

Falsches Gradzeichen

Hallo Stefan, könntest Du für WP:BA #Nummernzeichen statt Gradzeichen nicht schon eine Liste erstellen. Das macht sich leichter, als ggf. die ganze Datenbank zu durchlaufen wenn man keinen Dump nutzt. -- @xqt 08:55, 12. Jun. 2009 (CEST)Beantworten

Ich hab es auf meine To-do-list gesetzt. -- sk 21:59, 3. Aug. 2009 (CEST)Beantworten
Ich lass das mal lieber raus, weil ich nicht weiß ob auch in anderen Sprachen das so gehandhabt wird. Scheinbar hat das ja mit dem Bot geklappt. -- sk 21:07, 18. Aug. 2009 (CEST)Beantworten

Error 083 - possible bug (WP in Italian)

Hello, this short message to inform you that the Check Wikipedia script run on Wikipedia in Italian flagged for an error 083 (Headlines start with three "=" and later with level two) on the article "it:Episodi di Pocket Monsters Diamond & Pearl" where actually the first headline starts with two "=" but it is inserted within a "noinclude" pair like that: <noinclude>== Title ==</noinclude>. The article was derived by "stripping" part of the contents from a very long original one and the "noinclude" is useful for a correct handling of nested articles. IMHO, if possible and if this does not conflict with other processing of the script, the "noinclude" should be ignored so that the script detects the logically correct sequence of headline levels. Thank you very much and keep up with this precious job. -- L736E 18:46, 8. Jul. 2009 (CEST)Beantworten

Hello L736E, I think I need this detection of "noinclude" for other thinks. But also I think this headline inside a noinclude is a bug in the article. At the moment I have no idea, how to fix this. -- sk 21:21, 13. Jul. 2009 (CEST)Beantworten

Other namespaces

Hi Stefan. I think that the script only searches for errors on namespace 0 (principal). It may be nice that, on eswiki at least, it also find in namespace 104 (Anexo:), which is used for lists and can have the same errors to fix. Can your script scan namespaces 0 and 104 next times? Thanks in advance! Muro de Aguas 19:07, 9. Jul. 2009 (CEST)Beantworten

Hello Muro de Aguas, thanks for this info. I never heard about this namespace 104. This is very interessting. I will try to include this in the next time. -- sk 21:24, 13. Jul. 2009 (CEST)Beantworten
I write this at my To-do-list. -- sk 22:00, 3. Aug. 2009 (CEST)Beantworten
 Ok, I have include this. -- sk 21:06, 18. Aug. 2009 (CEST)Beantworten

Error #082 on Swedish Wikipedia

Links starting with "S:", like in [[S:t Lukasstiftelsen]] is not a link to any other wikimedia-project from the swedish wikipedia, because there are many names starting with S:t in swedish. Best regards! -- Lavallen 21:51, 10. Jul. 2009 (CEST)Beantworten

Ohh, very interesting. I think this is the short link to Wikisource. What did you use as shortlink to Wikisource in Swedish Wikipedia? -- sk 21:31, 13. Jul. 2009 (CEST)Beantworten
The Swedish Wikipedia use "src" as shortlink to Wikisource. Elfsborgarn 13:33, 16. Jul. 2009 (CEST)Beantworten
I see you deactivated this in svwiki. It would be many work to include this in the script. -- sk 21:10, 18. Aug. 2009 (CEST)Beantworten

Error #040 in Japanese Wikipedia

Hello, Stefan!

In jawiki, there are a lot of HTML font tag, but script reports no font tag. This problem was reported until 2009-01-30 version. but it occured next 2009-02-11 version. Best regards! --Mymelo 03:30, 11. Jul. 2009 (CEST)Beantworten

Thanks for this info. Also in other languages there are no errors. I will check this in the script. Maybe I an fix this. -- sk 21:35, 13. Jul. 2009 (CEST)Beantworten
Many thanks for your comment. --Mymelo 13:40, 18. Jul. 2009 (CEST)Beantworten
 Ok, I have change the script a little bit. No I search not only for "<font>". It will also searched for "<font ...". Maybe this help. We will see this tomorrow.

Error #033

Hello Stefan. As far as I can see here, there is no wiki syntax routine to replace underlined text (<u>) so what is the use of this error? --Superyetkin 18:08, 13. Jul. 2009 (CEST)Beantworten

See here. The underline is a tag which will not supported in the future of html. If you really need this in a article than it should stand in span. This is XHTML-conform. -- sk 21:37, 13. Jul. 2009 (CEST)Beantworten
I'm sorry to interrupt. But that is bullshit. You have no right to force people to use span-tags in stead of u-tags. Mediawiki should simply keep supporting u-tags, end of story. Follow the kiss principle. -- chemiewikibm cwbm 22:31, 13. Jul. 2009 (CEST)
You can disable this check on your wiki if you don't want to use it. Just set the priority in the _xx part of the translation text. --Vina 08:51, 14. Jul. 2009 (CEST)Beantworten
Thanks for the clarification, Stefan. I would really appreciate it if you answered my other query about the error #003 above. Cheers! --Superyetkin 23:08, 13. Jul. 2009 (CEST)Beantworten

False positive #81

Dear Stefan! Article hu:Stadler FLIRT contains some extreme large references with embedded tables. Tables are different in ref#17-ref#19 but your script reports them identical. -- Bitman 193.6.17.154 18:58, 16. Jul. 2009 (CEST)Beantworten

I delete the table for my script and so the references are identical. I never see an reference like this. Why do you need this? I think a reference should only get a link to a source. -- sk 20:21, 16. Jul. 2009 (CEST)Beantworten

I can't answer, I'm not editor of the article. I write a modular bot to repair errors discovered by you. Repairing #81 is quite complicated but not impossible: hu:User:GumiBot/code81. I think exact detection of identical refs may be less hard. :-) -- Bitman 193.6.17.197 07:08, 17. Jul. 2009 (CEST)Beantworten

Bug from Danish Wikipedia

After editing some ref's from #81 in a couple of articles, new ref-bugs from the same articles (da:Jehovas Vidner and da:Dansk køkken)appeard - but they weren't added to the article after I edited them. So the conclussion must be that #81 doesn't catch more than one bug from a articel at a time :) --Anigif 23:18, 12. Aug. 2009 (CEST)Beantworten

Yes this is right. I give only the first double ref. Because sometime many of them in on article. -- sk 21:29, 18. Aug. 2009 (CEST)Beantworten

Many suggestions

Hello Mr. Kühn,

While editing on the French Wikipedia, I found many possible errors. I list an example of each.

  • HTML entity &#x2200; (∀) should be translated into Unicode character \u2200.
  • HTML entity &#2200; (࢘) should be translated into Unicode character \u0898.
  • Many HTML entities, like &Eacute (É), should be converted to Unicode. However, &nbsp; must be excluded, since it has legitimate use. A list of such entities is given by the Web Design Group. I have the full list in a JavaScript file, I can send it to you (I use it within my Firefox extension, Weekedit). They are listed on the EN.WP : en:List of XML and HTML character entity references.
  • If the title of the article is PSoC, the sort key should be {{DEFAULTSORT:Psoc}}.
  • A category sort key with a diacritic, like [[Catégorie:Acteur français|Depardieu, Gérard]], is bad.
  • The wikilink [[fractale|fractales]] should be shorter : [[fractale]]s.

Regards,

Cantons-de-l'Est 03:19, 17. Jul. 2009 (CEST)Beantworten

Very interesting ideas. I will try to insert this in my script. -- sk 22:01, 20. Jul. 2009 (CEST)Beantworten
A comment: I don't think the last one is a good idea, because it will increase the number of false positives of a spell-checker. --129.215.104.155 12:36, 21. Jul. 2009 (CEST)Beantworten

Error #003 in Japanese Wikipedia

Hello, Stefan.

In jawiki, there are error reports on error #003, but 2 article is fixed by reference tamplate.

ja:国際水泳連盟 has template {{脚注リスト}}, that is new redirect for {{Reflist}} template. Please set your script for Japanese localise. But I am not find out ja:八王子市's problem. Best Regards. --Mymelo 13:11, 18. Jul. 2009 (CEST)Beantworten

I will insert this at the next weekend. -- sk 22:02, 20. Jul. 2009 (CEST)Beantworten
I will also check ja:八王子市 at the weekend. At the moment I don't find a problem.-- sk 22:05, 20. Jul. 2009 (CEST)Beantworten
Stefan. Could you be so kind to do the same (update your script) for Turkish wiki as well? Actually, I had mentioned this before (see my above posts) but you do not seem to have recognized them at all. Thanks for your help. --Superyetkin 22:48, 20. Jul. 2009 (CEST)Beantworten
There is only one script. It work in all languages. -- sk 22:12, 3. Aug. 2009 (CEST)Beantworten

Error #69

Hi, there may be a false positive in it:Codice ISBN as one image name contains "ISBN-13". That's my guess, could you check it as well? Marcol-it

I can insert this article as exclude article. I do this with article ISBN in de. -- sk 22:06, 20. Jul. 2009 (CEST)Beantworten
Thanks, that will be good! :) Marcol-it 18:08, 21. Jul. 2009 (CEST)
 Ok, I fix this. -- sk 22:11, 3. Aug. 2009 (CEST)Beantworten

We get the same false positive in ca:Lector de codi de barres, and we will get it in ca:ISBN if it gets inspected. Can you please white-list them? --JoRobot 23:45, 1. Sep. 2009 (CEST)Beantworten

Suggests from France

Hallo Stefan
From the french project come few requests and suggests.

References in headline

French conventions about this point say to insert <ref>xxxxx</ref> in text article rather than headlines, nearby the words or sentences the reference is talking about. The problem have been suggested on french Project:CheckWiki discussion page and agreed for suggestion. Do you agree to add this detection (hoping this could be usefull for others projects...).

I don't understand the discussion in french. Please describe the problem with an example. Thanks. -- sk 09:04, 4. Aug. 2009 (CEST)Beantworten
IE with this french article M63 -> Look here the first version -> A reference is in the first headline, it should be placed in a small introduction sentence rather than in this title... The reference should be at the end of the sentence or nearby the word you want to "support" with this reference. We'd like to detect this... --II...Richard...II 17:07, 4. Aug. 2009 (CEST)Beantworten
Others examples can be find there [1] [2] [3] [4]. I hope it will be useful for you. Regards, 86.68.72.207 06:55, 14. Aug. 2009 (CEST)Beantworten

Thumb: "right" parameter useless

When using the parameter "thumb", "right" become useless and redundant. The use of this two parameter together is really common. Even if the image display is not affected, "right" needs to be deleted. I think this detection is useful to clean the display of pictures. What about this one ?

Good idea. But this stand allready at my To-do-list (see: thumbs with forced size)-- sk 09:01, 4. Aug. 2009 (CEST)Beantworten
Ok. Good news. It will be, so... "right" was not explicitly written... and "thumbs with forced size" is a good dectection, it should be "upright"... --II...Richard...II 17:07, 4. Aug. 2009 (CEST)Beantworten

Image without description

This detection returns a lot of false-positive errors. The problem have been suggested on french Project:CheckWiki discussion page and agreed for suggestion. When an image is used as a simple image, or in infobox (and "Template"), image description become as alternate despcrition. Alternate description problem is complicated and different from the need of description in a "thumb" image. "Image with really description needed" and "image with description not needed" are melted (I know alternate description is needed, but it's another problem). We'd like to modify the error 30, 2 ways have been suggested :

  1. Detect only description really needed : Thumb and gallery (gallery is allready detected) -> so only when "thumb" is added to the image. Simple image and image in infobox (and template) can be forgotten.
  2. Divide error in two pieces : same detection (only for thumb) and keep detection for simple image and image in infobox (and template).

The first one could be the best (at least for France) while at this moment the problem resolution of the alternate description of image is not going to happen soon. This fix could makes image problem easier, do you agree with this changes ?

Maybe ignore for this pictures with size smaller than 50 px? JAn Dudík 08:48, 23. Jul. 2009 (CEST)Beantworten
The script is stupid. It only detect images in the text. Every image should have a description. Also the very small one. Yes this is much work. The only way is to divide this error. One with only thumbs and the rest. -- sk 09:09, 4. Aug. 2009 (CEST)Beantworten
Splitting it in two errors is fine. This would fix the image really without description at least... "One with only thumbs and the rest", as you say... --II...Richard...II 17:07, 4. Aug. 2009 (CEST)Beantworten

Exception list

The problem have been suggested on french Project:CheckWiki discussion page and agreed for suggestion. A list of article would be ignored by the script. A lot of fale-positive can only be fixed in this way. I.E. with error 058 (Titre de section en capitales / Headline ALL CAPS). Some headline MUST be all caps (i.e. : fr:Variable d'environnement, fr:James L. Jones, This headlines have to be all caps. No way). Or I.E. with error 037 (DEFAULTSORT nécessaire manquant / DEFAULTSORT missing for titles with special letters). (i.e. : fr:々, this kind of article does not need DEFAULTSORT).
This list would be maintained by each checkwiki country/users. Each exception list would be separated by errors (headline 1=error 001, headline 2=error 002, etc) in order to keep the list easy to maintain. How and when (before each result buiding process) read and use the exception list, I can't answer well that as I'm not abble to program.
What do you think of this ?

At the moment I work one a concept for a Whitelist. So every language can insert there article for excluding from the process of er error. -- sk 09:11, 4. Aug. 2009 (CEST)Beantworten
Good news... --II...Richard...II 17:07, 4. Aug. 2009 (CEST)Beantworten

Output limit

We'd like to increase the output limit of errors displayed by the script from 50 to 100. Indeed, the old version of the program returned 50 errors each day, while the new version only returns 50 every two days. As a consequence, the total number of errors rise up because errors are created faster than corrected. Could you please increase the maximum number to 100 in order to restore the previous rate/situation ?

--II...Richard...II 17:23, 21. Jul. 2009 (CEST)Beantworten

At the moment the biggest list is in enwiki with 272KB. I have only one limit for all languages. Maybe I can change this for frwiki. I will try this. --sk 09:16, 4. Aug. 2009 (CEST)Beantworten
Ok, Thx... i wasn't sure that you will agree, i thought this could increase the time scan, and then stress server...
 Ok, I have change the limit from 50 to 100 only for frwiki. -- sk 22:26, 18. Aug. 2009 (CEST)Beantworten

Error 063

<sub><small>testo</small></sub> is detected, but <small><sub>testo</sub></small> isn't. Normal behaviour ? --II...Richard...II 16:31, 23. Jul. 2009 (CEST)Beantworten

I'm wrong ? --II...Richard...II 17:07, 4. Aug. 2009 (CEST)Beantworten

White space detection

Hello, can you insert new error - articles with long text with whitespace at teh begining of line. Whitespace canbe used

for scripts or something like,

but I think this scripts might be shorter than e.g. 80 characters. JAn Dudík 08:46, 23. Jul. 2009 (CEST)Beantworten

I have also this idea, but I have no good algorithmen to detect this. There are too many problems at the moment. For example source or templates. -- sk 08:57, 4. Aug. 2009 (CEST)Beantworten
pywikipediabot uses serveral exceptions for text inside various tags, maybe you can give a look at it. --Nemo bis 01:37, 12. Aug. 2009 (CEST)Beantworten

non detected templates for # 34

Hello, your script should detect using of {{#ifexist|a|b}} templates like here JAn Dudík 08:50, 23. Jul. 2009 (CEST)Beantworten

 Ok, I insert this in the script. -- sk 22:08, 3. Aug. 2009 (CEST)Beantworten

Special characters in interwiki

Can you detect special characters in interwiki like after this edit? JAn Dudík 11:04, 23. Jul. 2009 (CEST)Beantworten

Good idea. I write this at my to-do-list. -- sk 08:59, 4. Aug. 2009 (CEST)Beantworten

More flexible error list?

Dear Stefan!

At this moment one of your programs gets file http://toolserver.org/~sk/checkwiki/huwiki/huwiki_translation.txt and puts content of fields error_XXX_head_script into the final error list. If no error fount field content is copied unchanged otherwise it is surrounded with [[...]] markers.

I would add additional info this table column but the above mechanism does not allow it. However I'd have a suggestion. If you find a template called chkwiki here you should not add square brackets but overwrite the first parameter with error count. I mean something similar:

error_XXX_head_script={{chkwiki|count=N|errno=XXX|msg0=cell_text_A|msg1=cell_text_B}}

Error count should be put in place N. At this point we could write arbitrary templates that changes the displayed text according to error count. Sky is the limit. :-)
However if translated text does not begin with {{chkwiki|count=N|... your program would apply the current algorithm. This way compatibility is preserved with current style translations.

What is your opinion? -- Bitman 08:01, 29. Jul. 2009 (CEST)Beantworten

The problem is that every language need this template. Every change must be change in all language. At the moment we have more then 30 languages. But in the future it will be more then 30. I think this is not practicable. -- sk 09:57, 4. Aug. 2009 (CEST)Beantworten

Uhmm... I don't understand what you mean. Could you show an example? AFAIK the solution I suggested is totally independent on number of checked wikis and languages. It is not necessary to write such a template in every wiki. If somebody needs it he uses it. Other wiki maintainers do not care with it. They get the current internationalised error list. -- Bitman 18:45, 4. Aug. 2009 (CEST)Beantworten

Of course national templates are created and maintained by local people. You have nothing to do with them. --Bitman 18:55, 4. Aug. 2009 (CEST)Beantworten
Ok, I understand. You mean I should update the script so that every language can use an own template inside the translation. If I understand you right then is the problem the [[...]]. But I don't understand what do you want with this template? I use the "error_XXX_head_script" only as headline and inside the statistic table. Please describe me better the "Sky" :-) -- sk 21:50, 4. Aug. 2009 (CEST)Beantworten
Yes, the problem is that there is no way to adapt to [[...]] placed (or not) by your program. A localized template however would apply (or not) square brackets where necessary depending on error count meanwhile other elements of the table cell remain fixed.
Actually I want to add a warning icon to items that are bot correctable so human editors would not waste their time by editing these trivial errors.
Another advantage: now our editors remove manually the wikilink leaving the plain text in table cell after fixing every errors of a certain kind. It is faster and easier a bit to change template parameter count from 11 to 0.
The sky? Version 2.0 of the template may also insert a smiley into the cell of solved errors. --Bitman 16:25, 7. Aug. 2009 (CEST)Beantworten

Report error on pt.wikipedia

Hello, there is an error on pt:Wikipedia:Projetos/Check Wikipedia, the report shows no errors despite they exist, is it a script or dump problem? Please respond at pt:Usuário Discussão:Alchimista. Alchimista 22:02, 1. Aug. 2009 (CEST)Beantworten

The same for pl.wiki (Polish) - report is empty. PMG 19:03, 2. Aug. 2009 (CEST)Beantworten
Hello, I have not change the script in the last two weeks. Maybe there was a toolserver-problem. -- sk 08:56, 4. Aug. 2009 (CEST)Beantworten
Yes, there was a toolserver problem. The directory of the dumps was deleted. -- sk 21:52, 4. Aug. 2009 (CEST)Beantworten

eo.wiki

Hi, I'm a user in pt.wiki (and sometimes, in eo.wiki, too). There's a broken link for the Check Wikipedia page for eo.wiki:

Instead of:

Please copy and paste that page at the toolserver to this page here.

It should be:

Please copy and paste that page at the toolserver to this page here.

One more thing: the error 006 (DEFAULTSORT with special letters) brings a false positive for eo.wiki, because they use DEFAŬLTORDIGO for DEFAULTSORT. Is it possible to avoid that? Is there something I can do?

Thanks in advance. Castelobranco 06:27, 5. Aug. 2009 (CEST)Beantworten

The last one (I think!): the page at the toolserver brings some special characters, like Stefan Kühn, etc. Castelobranco 02:35, 6. Aug. 2009 (CEST)Beantworten
Don't use the txt! Use the HTML-File. -- sk 08:46, 6. Aug. 2009 (CEST)Beantworten
Is this HTML file available for all wikis? --Superyetkin 11:22, 6. Aug. 2009 (CEST)Beantworten
Yes. It is. (Sorry, I fix the link to ~sk/checkwiki/eowiki/eowiki_output_for_wikipedia.html. -- sk 11:38, 6. Aug. 2009 (CEST)Beantworten

Sorry, I didn't see that it was a link to a txt file. It's fine now, and I'm translating the page. Thanks! Castelobranco 16:08, 6. Aug. 2009 (CEST)Beantworten

Spanish Wikipedia

In es:Wikiproyecto:Check Wikipedia, section 4.6 'DEFAULTSORT is missing and title with lowercase_letters' there is no table. The output is given as a simple list of links. Sabbut 11:02, 10. Aug. 2009 (CEST)Beantworten

This is normal. :-) -- sk 21:38, 18. Aug. 2009 (CEST)Beantworten

frwiki output errors

We have also some problems on french project...

Check the output

Somes tables contains many [[:]] and appear even if there is no errors...

Some of errors have an abnormal big slimdown this morning. I was working yesterday with some lists like : "List of all articles with error 067", "List of all articles with error 050", "List of all articles with error 018" of that i'm sure there are still thousands. --II...Richard...II 12:16, 10. Aug. 2009 (CEST)Beantworten

I have change the script and I think this could be the reason for this problem. I will check this tonight. -- sk 12:41, 10. Aug. 2009 (CEST)Beantworten
Ok Thx --II...Richard...II 13:29, 10. Aug. 2009 (CEST)Beantworten
Ditto this; a lot of the numbers for en.wp went down dramatically. My bot's been doing some stuff over the past few days, but not this much! -Drilnoth (Talk) 17:49, 10. Aug. 2009 (CEST)Beantworten
 Ok, I hope it will run. Yesterday, I have fix a bug inside the script. The problem was article with "+" like A+ or C++. My script couldn't scan this article. I have fix this and it work very well. But than an other problem was created. The problem was title with "&" and '. Like Command & Conquer or Detroit Bright's Goodyears. I hope I have fix this problem. Now I have start a new run of all languages. Please tell me if something is wrong. Thanks. -- sk 22:21, 10. Aug. 2009 (CEST)Beantworten
Ah, that makes sense. Thanks! -Drilnoth (Talk) 22:40, 11. Aug. 2009 (CEST)Beantworten
about this problem, all is fine but error 50 on fr : only 11 are detected... there are still 1500 errors. 1500 can't disappear... --II...Richard...II 21:25, 12. Aug. 2009 (CEST)Beantworten
error 38 is wrong also (sorry !)--II...Richard...II 16:47, 13. Aug. 2009 (CEST)
I checked all the list, error 18 is wrong also, it's the last one... --II...Richard...II 12:13, 14. Aug. 2009 (CEST)Beantworten
Thanks for this feedback. I will check this. -- sk 21:20, 13. Aug. 2009 (CEST)Beantworten

Ok, error 018 had an error. I have fixed this. Error 038 is working very well. And the small list of error 38 and 50 is a problem of this one run with this empty list. With the next dumpscan this will be ok. -- sk 22:06, 18. Aug. 2009 (CEST)Beantworten

All is fine, now. Even others errors detection have detected some articles the script was missing, here or there...--II...Richard...II 14:23, 21. Aug. 2009 (CEST)Beantworten
I bump this, look in the table in this archived version where 257 errors for 038 disapear -- - Richard ⇔ 11:17, 6. Sep. 2009 (CEST)

verschiebung auf id:

moin stefan. auf id: wurde id:Wikipedia:WikiProjekt Check Wikipedia nach id:Wikipedia:ProyekWiki Cek Wikipedia verschoben (und konsequenterweise auch id:Wikipedia:WikiProjekt Check Wikipedia/Translation nach id:Wikipedia:ProyekWiki Cek Wikipedia/Terjemahan). das müsste wohl noch gefixed werden; oder muss man für eine funktionierende übersetzung noch mehr beachten?
ausserdem wurde fy:Wikipedy:WikiProject Check Wikipedia nach fy:Meidogger:Stefan Kühn/WikiProject Check Wikipedia verschoben--AwOc 15:58, 12. Aug. 2009 (CEST)Beantworten

Danke für die Info. Ich werde das mal am Wochenende updaten. -- sk 21:21, 13. Aug. 2009 (CEST)Beantworten
 Ok, eingebaut. -- sk 21:59, 18. Aug. 2009 (CEST)Beantworten

Kleingeschriebener Kategoriename

Hallo Stefan, mir ist aufgefallen, dass dieser Fehler bei der letzten Kategorie in einem Artikel nicht anschlägt. Z.B. war in 15 Sagittae B ein Kategorie:doppelstern seit August 2008 unbemerkt. Erst nachdem am 3. August 2009 noch eine Kategorie dazukam, wurde der Fehler jetzt bemerkt. Andim 10:24, 13. Aug. 2009 (CEST)Beantworten

Da muss ich nochmal schauen, wo mir da die Kategorie verloren geht. Danke für den Tipp. -- sk 21:22, 13. Aug. 2009 (CEST)Beantworten
 Ok, Hab den Fehler gefunden. War ein Copy and Past fehler meinerseits. Danke nochmal! -- sk 21:45, 18. Aug. 2009 (CEST)Beantworten

Erklärungstext für Priorität

Ich weiß nicht, ob das schonmal irgendjemand angesprochen hat: Ich würde es als sinnvoll sehen, wenn auf der Übersetzungsseite die Möglichkeit besteht Text zu definieren, der direkt nach den Prioritäten-Überschriften kommt. Dieses Feld sollte standardmäßig leer sein und kann lokal gepflegt werden. Man könnte dort Informationen zur Wichtigkeit der Fehler einfügen oder ähnliches. Der Umherirrende 12:16, 14. Aug. 2009 (CEST)Beantworten

Hmm, Ich denke das jede Information beim Fehler direkt stehen sollte. Da ja sich auch mal durch Diskussionen die Priorität eines Fehlers ändern kann, wäre der Text sehr wartungsanfällig. Ich denke das bringt keine Vorteile. Bisher hat das auch noch keine Sprache gewünscht. -- sk 21:48, 18. Aug. 2009 (CEST)Beantworten
Es ging mir hierbei auch eher um die allgemeine Erklärung von Fehlern. Beispielsweise etwas wie "Die nachfolgenden Fehler werden per Bot bearbeitet" oder "Die Fehler dieser Priorität sind nur kosmetischer Natur, suche noch weitere Verbesserungsmöglichkeiten im Artikel" etc. Der Umherirrende 18:32, 20. Aug. 2009 (CEST)Beantworten

Database dump?

Hi- if it's possible with the coding, could you run a full scan of the next database dump on en.wiki? My bot's been working on some of the articles, but the script hasn't been really detecting it (sometimes I work from the bottom of a list, sometimes from the top, but it doesn't seem to matter either way). Thanks! -Drilnoth (Talk) 17:58, 17. Aug. 2009 (CEST)Beantworten

Hello Drilnoth, yes it is possible. At the weekend I will start for you a dumpscan of enwiki. The problem is my scan need for enwiki many time. Because it scan also all templates for the TemplateTiger. This need time. And en has now more then 3 Million articles. At them moment I have stopped the automatic dumpscanning of all languages. If a language need a need dumpscan, than please tell this here. I work at the script so that the scan will be faster in the future, but also all languages are growing up. :-) -- sk 21:56, 18. Aug. 2009 (CEST)Beantworten
Awesome; thank you. I understand that it is tough for the script, but just having a new dump scanned every few months is very helpful. Thanks! -Drilnoth (Talk) 17:56, 21. Aug. 2009 (CEST)Beantworten
There was a problem with the toolserver. A blackout or so. I will try it next weekend again. -- sk 16:43, 27. Aug. 2009 (CEST)Beantworten
Okay; thanks for the update. -Drilnoth (Talk) 22:26, 29. Aug. 2009 (CEST)Beantworten

Vorschlag: Kategoriesortierung identisch Kategoriename

Hallo Stefan! Ich hatte es ja schon mal bei den Personendaten-Fehlern angesprochen, aber da es eigentlich alle Artikel betrifft (und hier der Archivbot nicht so schnell archiviert), melde ich es mal hier: Ab und zu findet man Kategorien wie [[Kategorie:Mann|Mann]], die daher herrühren, dass versehentlich ein senkrechter Strich eingefügt wurde und die Software dann den Kategorienamen ergänzt hat. Bei den Männern betrifft es zur Zeit 13 Stück, im Dump vom Januar habe ich immerhin 463 Zeilen gefunden, die auf \[\[Kategorie:([^|]*)\|\1\]\] passen. --Schnark 09:34, 20. Aug. 2009 (CEST)Beantworten

Klasse Idee. Ich werde das mal ausprobieren. -- sk 13:25, 22. Aug. 2009 (CEST)Beantworten

aufgeteilte ausgabe-dateien

moin. kannst du drei zusätzliche ausgabe-dateien hinzufügen, jeweils für ein fehlerniveau. in der chinesischen variante teilen sie es so auf weil es am stück wegen des utf8-overheads zu groß wird. auch bei anderen projekten könnte es sinnvoll sein es so aufzuteilen. wenn man es nicht erst händisch auseinanderpflücken müsste wäre das hilfreich --xAwOc 16:38, 21. Aug. 2009 (CEST)Beantworten

Ich arbeite an einer Umsetzung als dynamische Webseite, dort wird das kein Problem sein. Ich hoffe, die chinesische Wikipedia kommt solange noch mit dem alten Modus aus. Es könnte noch ein paar Wochen dauern, aber die ersten Test sehen schon ganz gut aus. -- sk 10:19, 26. Aug. 2009 (CEST)Beantworten
Hallo AwOc, das neue Interface ist schon recht weit. Ich hoffe das hilft der chinesischen Wikipedia schonmal weiter, obwohl noch alles in englisch ist. -- sk 07:56, 1. Sep. 2009 (CEST)Beantworten

Hilf argumenten

Some french wikipedia users doubt about the need of some errors detections.

Can you explain me how to argue, (quicly, of course...) the goal of errors like error 002 (better reading of the wiki log ?), 018 and 022 (better reading of the wiki log ?), 057, and some others like html entities. In fact, errors bringing no real changes on articles. --II...Richard...II 21:50, 25. Aug. 2009 (CEST)Beantworten

  • 002 the XTHML-Standard is <br />. All other is wrong. But you can write also <br>, this will my script not found. But all other different spelling like <br\> <\br> </br> will be found.
  • 018 The Standard in Wikipedia is "category:Island" you can also write "category:island" but it is not standard. It would be nice if all have a big first letter. Why? If you search in the dump a article with this category, you must only search for one ("category:Island"). In german Wikipedia we have not more then 20 of this problems. This is easy. Ok you in frwiki have over 30000. Maybe you will not fix this, than it is ok. Than you can deacitvated this error.
  • 022 It is better for searching in the dump. I think this is no problem in frwiki. I see only one in frwiki.
  • 057 A normal headline will not end with ":" This is not good style.

I hope this help. -- sk 10:33, 26. Aug. 2009 (CEST)Beantworten

Thx for answering... --II...Richard...II 17:49, 27. Aug. 2009 (CEST)Beantworten

[tt_news]

moin. kannst du noch nach den texten [tt_news] und [backPid] suchen? die sollten eigentlich immer durch [tt_news] und [backPid] maskiert sein (siehe wiki-quelltext). viele wissen das nicht und so landet das oft in artikeln (diff, diff) --xAwOc 05:06, 26. Aug. 2009 (CEST)Beantworten

Kannst du mir erklären wo diese Dinger herkommen? Hör das zum ersten Mal. Hast du noch Hintergrundinfos? Vielleicht ein Wikipedia-Artikel, der mir das Zeug erklärt. Danke. -- sk 10:23, 26. Aug. 2009 (CEST)Beantworten
Typo3 benutzt das wohl. taucht jedenfalls häufiger mal in verschiedenen urls auf --xAwOc 15:24, 26. Aug. 2009 (CEST)Beantworten

Database dump on ptwiki

If it's possible, could you run a full scan of the next database dump on pt.wiki? Thanks! Rjclaudio 23:04, 26. Aug. 2009 (CEST)Beantworten

It is running. -- sk 16:42, 27. Aug. 2009 (CEST)Beantworten
just to know :
When, theoretically, ask a fullscan ?
What are the symptoms, reasons, times to do it ? --II...Richard...II 17:52, 27. Aug. 2009 (CEST)Beantworten
The biggest reason: too much text! Symptoms: need many time. The last full scan of a dump from frwiki need 900 minutes (15 hours) for 3.330.035 pages. This is 3700 pages/min or 61 pages/second. I think this is fast. But if I calculate this for enwiki with 17.809.799 I will need 4813 minutes or 80 hours! Also I am not alone at the toolserver. ptwiki has only 1.957.261 this is not so big. Maybe 8 hours. The long time is the reason for the stop of automatic fullscan. I think it is ok if we scan a dump all 2 or 3 months. After a scan we have many to do. Also every day I check the new articles and last changes, so many changes will included without fullscan. -- sk 19:12, 27. Aug. 2009 (CEST)Beantworten
Sorry, I think I miss understand your question. When ask a fullscan? Answer: When there is no error in the list! :-) Sometime someone will fix only one error and need new data. This can be a reason. -- sk 22:39, 27. Aug. 2009 (CEST)Beantworten
Interesting, by the way... Thx --II...Richard...II 07:37, 28. Aug. 2009 (CEST)Beantworten


Template Navbox directly in articles

The script detects some templates written directly in articles (with error 085, detecting noiclude or includeonly). Some users use any of them (deleting name and others parameters...) Exemple here with this navbox (I've got others examples if needed, i spent 2 hours on this to clean some of this, two or three days ago). A template navbox must be in "Template spacename". Possible to detect if this templates are on articles ?

For articles : en:Template:Navbox, fr:Modèle:Méta palette de navigation

I saw in your todo list "error 69 - no detect "ISBN-10:", "ISBN-13:", "(ISBN-10)", "(ISBN-13)" most before or after a ISBN"
MediaWiki transform ISBN XXXXXXXX in a wikilink (i suppose mediawiki does it in de Wiki...). And so ISBN-13 or ISBN-10 don't do that... detect this allow to correct it. I don't understand why you want to don't detect this ?!
ISBN 1212541254
ISBN-10 1212541254 <- just see the difference
Detecte and clean this is a good thing, from my point of view
The probem is that the code detect it even if the is no number after (ISBN-XX without number is then used as Title for example, for table...)... If this have already talk, just forgot it -- - Richard ⇔ 21:30, 30. Aug. 2009 (CEST)
No I don't forgot this. But at the moment I have no time. Currently I work on a new interface for Check Wikipedia. So I stop the working at all other things. I hope I can finish this interface in the next days. It will help all languages more easy to work with Check Wikipedia. -- sk 14:04, 31. Aug. 2009 (CEST)Beantworten
Ok. I will bump this thread in some time, so... if needed...
And i can't wait to see the new interface !! -- - Richard ⇔ 23:01, 31. Aug. 2009 (CEST)

New interface

Here is the new interface. The basic functionality is implemented. But I think I can add in the next time many more. If you have ideas for new features then tell this here. In the next time I will implement a whitelist and also a better updating of the data. -- sk

The links to Japanese Wikipedia and its translation page are broken. Possibily character encoding problem? --fryed-peach 10:48, 1. Sep. 2009 (CEST)Beantworten
Yes, I have see this too. Also in other languages (ru, ar). I will fix this. -- sk 11:05, 1. Sep. 2009 (CEST)Beantworten
Hi, I made some testing, it appears to be handful. No request, only what i think ;) ... It's clean, and "squarred"... Time of loading page are good. Colors.. (every tastes is in the world ! perhaps it will be twaekable...)... I don't know the way you thought it... an include on each project page will be possible... The done button is awesome ! The possibility to have a big output number ( ← 100 bis 125 → for example, is really useful) -- Cordialement - Richard ⇔ 1 septembre 2009 à 14:06 (CEST)
Hi. Just to be sure : links like this one will still be available ? It's just for tools being able to read the list of errors. --NicoV 14:23, 1. Sep. 2009 (CEST)Beantworten
Maybe in the future I will implement this inside the script. So only the link will be change in the future. But the page will be available. Maybe under "&view=bot" or so. Is this ok for you? -- sk 14:28, 1. Sep. 2009 (CEST)Beantworten
Yes "&view=bot" should be ok. The idea is just to have a simple list (minimal formatting to have a simple parsing, ideally only a text file with a title per line) with all articles where a specific error has been detected. --NicoV 15:29, 1. Sep. 2009 (CEST)Beantworten
  • (sorry for the bad english)
  • Could add a "done" button to mark done in all articles that has a specific error. When using a (semi-)bot, clicking all 100+ "done" is impossible.
  • A sortable table by id/description/article/notice.
  • In "High priority/Middle/Low", dont show (or show in a separate list, or hide) itens that dont have any articles with errors.
  • Rjclaudio 21:13, 2. Sep. 2009 (CEST)Beantworten
Hello Rjclaudio, to 1) this is a good idea, but sometime we had vandales. I will check this, but later. To 2) Yes this will be possible. This is also my next idea. I will try this. First I must fix some basic problems at the database. To 3) Why is this usefull? I think it is ok, but I can also exclude this. -- sk 22:13, 2. Sep. 2009 (CEST)Beantworten
Could this "done button" at least delete page errors (25 entry) instead of the whole error list ? -- - Richard ⇔ 13:29, 3. Sep. 2009 (CEST)
Navigation problem, Example : When i'm fixing problem in an error list (ex: "Square brackets not correct begin") : if i choose "more" for an article, i go on the "article page error" (i will name it like that), and then it's hard and time wasting to go back to the first error list i come from (in my example "Square brackets not correct begin")... -- - Richard ⇔ 13:38, 3. Sep. 2009 (CEST)
Hello Richard, I hope I have fix this "navigation problem" for you. The line will not delete only the "done" will switch in "ok". So you can go back to the first page. -- sk 08:22, 4. Sep. 2009 (CEST)Beantworten

I suggest more ways to agroup the errors. Some projects use "BOT" e "AWB" in the name. If this interface could agrupo in the same table all error that a AWB can fix it will help a lot, and will be a good advantage over the old version. And maybe not only AWB/BOT, but the options could be customized in many ways by each project independently. Rjclaudio 13:00, 4. Sep. 2009 (CEST)Beantworten

Yes i saw it yesterday night, it's far better and it resolve the problem. Nice -- - Richard ⇔ 13:21, 4. Sep. 2009 (CEST)

Hi Stefan, a question about the "Done" button. Does it mark the problem as solved (until the next run of Check Wiki ?) so that people fixing errors can work more efficiently ? I am still working on WikiCleaner to provide an interface for fixing the errors (hopefully, a functional version before the end of the week-end), is there a way for my tool to simulate easily the click on the "Done" button ? --NicoV 16:02, 4. Sep. 2009 (CEST)Beantworten

@Rjclaudio: If I understand you right, then you want the info AWB/BOT or so for every error number. This is possible, but I need a list from AWB and Bots, maybe for every language? -- sk 16:38, 4. Sep. 2009 (CEST)Beantworten
@NicoV: See the Done-Link. You need only to send the this http://toolserver.org/~sk/cgi-bin/checkwiki/checkwiki.cgi?project=dewiki&view=only&id=30&pageid=4534003 if you fix in de the error 30 for page 4534003. The script set make an update in the database ok=0 → ok=1 and not more. With the next scan at the moment all pages with ok=1 will be scanned. -- sk 16:38, 4. Sep. 2009 (CEST)Beantworten

Just like Wikipedia:Projetos/Check Wikipedia/Tradução, another page to associate error <-> bot/awb/manual/semi-bot/etc. Or using something like "error_091_desc_script=", but a " error_091_clas_script=2" (clas = classification). And just like

#########################
# error description
#########################
# prio = -1 (unknown)
# prio = 0  (deactivated) 
# prio = 1  (top priority)
# prio = 2  (middle priority)
# prio = 3  (lowest priority)

do a

#########################
# clas description
#########################
# clas = 0  (manual)
# clas = 1  (awb)
# clas = 2  (bot)

but with unlimited clas (or max 10). Each language would use 3 (manual, awb, bot) or 5 (manual, partial awb, awb, partial bot, bot), or 20.

And it could help integrate each project, working together to create rules for bot/awb to fix similar errors. In pt.wiki we have 52 error that use bot/awb, and maybe other languages have rules for the others. This would help find help in other languages.

Rjclaudio 17:28, 4. Sep. 2009 (CEST)Beantworten

Maybe showing rules to awb to fix some the errors. In pt.wiki we made it to some errors, but something universal (that each project would adjust, like changing in the rule "Image" for "Imagem") would be better. Rjclaudio 17:32, 4. Sep. 2009 (CEST)Beantworten
When you are on a page for example at "75 to 100" -> if you clilk the done button you go back to the previous 25 entrys page... -- - Richard ⇔ 01:05, 6. Sep. 2009 (CEST)
 Ok, I have fix this bug. -- sk 13:51, 6. Sep. 2009 (CEST)Beantworten

Hello, I love new interface, but I am also begging for button "all done". :) --Ragimiri 13:37, 7. Sep. 2009 (CEST)Beantworten

Ok, I will try to implement this. :-) But with many questions like "You are sure?"-- sk 15:22, 7. Sep. 2009 (CEST)Beantworten
 Ok, I have implement this function. -- sk 22:03, 7. Sep. 2009 (CEST)Beantworten
Description in error 063 contains a "small" tag. It create a graphic bug... -- - Richard ⇔ 20:50, 7. Sep. 2009 (CEST)
At the moment all description are bad. Because the include Wikisyntax and no html. I will fix this with a translation page. -- sk 22:03, 7. Sep. 2009 (CEST)Beantworten
If i'm right errors are sorted in high/medium/low based on the srcipt level and not wiki project level (maybe it will be done with including translation, because level are set there ?).
Undefined width for table are less usefull with some errors. When the table is larger than screen it's a pitie... to see how it is, see this error -- - Richard ⇔ 11:12, 10. Sep. 2009 (CEST)
Yes, at the moment only the script level will be used. I don't see the problem with the table. I think flexible is ok. Please use new headlines, for new requests. I dont like so long discussions. :-) -- sk 13:48, 10. Sep. 2009 (CEST)Beantworten

Error 082 in Finnish wikipedia

All the links starting with [[Wikipedia: (linking to Wikipedia namespace within fi-wiki) are included in the error report. --Jhattara 10:37, 1. Sep. 2009 (CEST)Beantworten

IMHO: This is a error. We write a encyclopaedia and not a Wikipedia-project. So in every article should only links to other articles. Only with this permission you can use this data outside of wikipedia. Like in a book or in an other project. -- sk 11:10, 1. Sep. 2009 (CEST)Beantworten
Most of the links to the Wikipedia namespace in Finnish Wikipedia are on the pages for years, decades, and centuries, where there is a link to the discussion about how to write time in Finnish Wikipedia. Those clutter the list beyond any usablitity. If the link [[Wikipedia:Keskustelua ajan merkitsemisestä Wikipediassa|ajan merkitseminen]] is included in errors, this error report will remain useless for the Finnish Wikipedia. --Jhattara 09:41, 2. Sep. 2009 (CEST)Beantworten
Actually... Just checked that the link to discussion is a redirect. The correct place it should link in Finnish Wikipedia is [[Ohje:Merkitsemiskäytännöt]]. --Jhattara 09:43, 2. Sep. 2009 (CEST)Beantworten
I understand the problem, we had the same in dewiki and in other languages. But this link should stand at the discussion page or in a comment inside the article. It should not stand inside the article text. For Example: If I read a article about the year 2001 I will not read how to write this article. - In the next time I will implement a Whitelist inside the new interface. I hope this will help for this problems. -- sk 10:33, 2. Sep. 2009 (CEST)Beantworten

WikiCleaner

Hi,

I have started working on Wiki Cleaner to add features in it for fixing the errors detected by your script. Version 0.93 is the first one with this. It's not yet functional and I still have a lot of work to do on it, but the basics are visible.

Main things that needs to be done :

  • Allow editing and saving the contents of the articles
  • Highlight detected errors directly in the text of the articles and propose fixes
  • Add other errors (currently only errors 48 and 80 are recognized)
  • Read complete list of articles on the tool server

If people have comments about this tool, please use my talk page on FR.

--NicoV 14:13, 1. Sep. 2009 (CEST)Beantworten

v0.94 is available : the page text is scanned and errors are highlighted directly in the text. Still not functional, since editing and saving are not done. --NicoV 22:16, 1. Sep. 2009 (CEST)Beantworten

Hi Stefan, I have released v0.95 that allows editing and saving the articles, and also detects other errors (11 types currently). --NicoV 19:45, 4. Sep. 2009 (CEST)Beantworten

Ideen zur Laufzeitproblematik

Ich habe im obrigen Abschnitt die Laufzeitproblematik gelesen. Ich habe mir daher einige Gedanken gemacht. Ich hoffe es hilft dir die Laufzeit zu verkürzen, bei gleichem Ergebnis. Ich hoffe auch, das du dich damit nicht angegriffen fühlst und es in dieser öffentlichen Form genehm ist. Ich möchte gerne helfen, da ich Teile der Fehler auch als nützlich ansehe und es die Qualität der Artikel verbessert diese zu beseitigen. Selber schaffe ich es leider nicht, immer den aktuellen Dump zu haben. Leider ist die Zahl der Verbesserungsvorschläge für eine Person auch zu viel. Viel Erfolg. Der Umherirrende 18:56, 1. Sep. 2009 (CEST)Beantworten

Was ich noch vergessen habe: Hut ab vor der bisherigen Leistung. Wenn du einen Vorschlag umsetzen möchtest, mache es am besten getrennt von anderen Sachen und vergleiche die Ergebnisse (Ausgabedatei oder so). Nur dann kann man sich sicher sein das alles richtig ist (und merkt einen Laufzeitsunterschied, kann auch auch schlechter werden). Falls du meinst, dass die Vorschläge nichts bringen, okay, du musst sie umsetzen, ich würde es dir nicht übel nehmen. Der Umherirrende 19:19, 1. Sep. 2009 (CEST)Beantworten


Würde es nicht auch gehen, wenn du pro Projekt unterscheidest, ob du nun den großen (All pages, current versions only.) oder doch nur den kleinen Dump (Articles, templates, image descriptions, and primary meta-pages.) brauchst? Und dem entsprechenden das auswählst. Das würde für en die Laufzeit halbieren (ich nehme an, die haben keinen Sonder-Namensraum) --Der Umherirrende 18:56, 1. Sep. 2009 (CEST)Beantworten


Wenn du mit foreach etwas suchst, solltest du die Schleife vorzeitig abbrechen, wenn es gefunden wurde. Nach dieser Seite geht das mit last (Ich habe keine Ahnung von Perl-Programmierung). Einige ifs in Schleifen kann man dann auch entschlacken. --Der Umherirrende 18:56, 1. Sep. 2009 (CEST)Beantworten


Ich würde die Namensraumabfragen am Anfang machen, direkt nach dem der Artikel gelesen wurde und nicht innerhalb der Fehler. Wenn der Artikel keinen relevanten Namensraum hat, dann braucht es auch keinerlei Zerlegung des Wikitextes, wird eh alles ungenutzt verworfen. Ein weiterer Vorteil ist, das du für einzelne Projekte den Namensraum leichter kontrollieren kannst. (In der Initalisierungsphase für das aktuelle Projekt die passenden Namensräume in einem Array festlegen, wogegen dann geprüft werden kann. Beispielsweise kann es sein, dass der Namensraum 104 in anderen Projekten aufeinmal nicht interessant ist). Der Umherirrende 18:56, 1. Sep. 2009 (CEST)Beantworten


Super. Vielen Dank für die Tipps. Da ich mich selbst als fortgeschrittenen Anfänger bei Perl betrachte, nehme ich gern jeden Tipp entgegen. Derzeit liegt erstmal das Augenmerk auf dem neuen Interface, was ja gut angenommen wird. Da sind auch jetzt schon genügend Fehler gelistet. Aber vielleicht komme ich in den langen Winterabenden mal zu einer wirklichen neuprogrammierung oder massiven umstrukturierung. Meist wächst ja so ein Programm organisch und dann kann das schon mal etwas zeitintensiv sein. ich denke den meisten Performancegewinn kann ich in einigen internen Umstrukturierungen rausholen. Das mit dem Dump hab ich schon beachtet, ich nehme immer nur die Kleinen. Das mit den Namensräumen mach ich schon so, am anfang wird der Namensraum ermittelt, und bei jedem Fehler wird individuell ausgeschlossen. Ich wollte möglichst flexibel bleiben. Das mit dem abrechen der Schleifen mach ich schon da wo möglich. - Das insgesamte Problem ist einfach das Wachstum. Man muss immer bedenken, dass vielleicht heute es noch geht, aber in drei Jahren so nicht mehr möglich ist. Deswegen will ich auch eher weg vom Dump hin zu einer Art Live-Scan, bei der regelmässig in den Wikipedias z.B. die Letzten Änderungen abgegrast werden. Zusätzlich will ich für jeden Artikelscan auch ein Datum abspeichern um nicht dreimal am Tag den gleiche zu scannen. Aber das ist noch zukunftsmusik. -- sk 20:56, 1. Sep. 2009 (CEST)Beantworten

Error 61 in ptwiki

The list of error 61 - Reference with punctuation (4-sep-09) there are some articles without this error that are shown in the list, like 105 Lélio Gama St. and 12758 (número). Rjclaudio 14:19, 4. Sep. 2009 (CEST)Beantworten

I think this is from a old dump. If you want sure, that this is in the article then use this new page. There you found for a bot all articles from the database, where no user set this as "Done". You can set the limit there to 500 and also scroll with the parameter "offset". I hope this will help you. -- sk 09:25, 7. Sep. 2009 (CEST)Beantworten

Could you change in the script the links at "List of all articles with error xxx" to this new url? Rjclaudio 01:38, 9. Sep. 2009 (CEST)

Sugestion to new errors with Defaultsort

Double Defaultsort, and Text after Defaultsort. Rjclaudio 14:19, 4. Sep. 2009 (CEST) 01:47, 6. Sep. 2009 (CEST)Beantworten

Double Defaultsort is a good idea. I write this at the To-do-list. But Text after Defaultsort is not possible. I have no good algorithm to detect this in de, en, es or ja, ar ... -- sk 09:16, 7. Sep. 2009 (CEST)Beantworten

If you can do this to category why cant use the same algorithm? Maybe you can create a error specific to some languages that you can make this easy. Rjclaudio 01:35, 9. Sep. 2009 (CEST)Beantworten

What about look defaultsort when the article is not categorized. In this situation it's useless (exemple). There are some... but perhaps you gonna say me it should be categorized ? -- - Richard ⇔ 10:48, 9. Sep. 2009 (CEST)

DEFAULTSORT (006 and 037)

Like the ca.wiki, the esperanto project has another name to the "DEFAULTSORT". We uses DEFAUxLTORDIGO, that creates a special letter ("DEFAŬLTORDIGO"). We have to maintain some special letters also in the sortkey ("Sahxarov" in the sortkey = Saĥarov). These "special letters" are allowed in that project: ĉ, ĝ, ĥ, ĵ, ŝ, ŭ and also Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ (in uppercase). This happens because they are different letters from c, g, h, j, s and u. Could them be ignored by the errors 006 and 037? If you need the unicode, just let me know. Thanks in advance. Castelobranco 02:52, 7. Sep. 2009 (CEST)Beantworten

These letters are written with an "-x" ("cx", "gx", "hx", etc.) But the eo-mediawiki - and as I see, the Check Wikipedia dump either - recognizes them as diacritics (ĉ, ĝ, ĥ, etc.). Castelobranco 02:57, 7. Sep. 2009 (CEST)Beantworten
Many thanks for this info. I will fix this bug. I write this on my To-do-list -- sk 09:04, 7. Sep. 2009 (CEST)Beantworten

Page moved (eo.wiki)

The esperanto project page was moved from eo:Vikipedio:WikiProjekt Check Wikipedia to eo:Projekto:Check Wikipedia, because of the creation of the namespace Projekto. Should I do something to correct the interwikis? The translate page was also moved from eo:Vikipedio:WikiProjekt Check Wikipedia/Translation to eo:Projekto:Check Wikipedia/Tradukado. Thanks. Castelobranco 03:29, 7. Sep. 2009 (CEST)Beantworten

Hello Castelobranco, thanks for this info. I will fix this in the script. And then with the next scan at Wednesday all page will have the right interwiki link. -- sk 08:55, 7. Sep. 2009 (CEST)Beantworten
 Ok, I have change this in the script. -- sk 20:31, 7. Sep. 2009 (CEST)Beantworten

Error Code 047:

Hello Stefan Kühn,

M.e. sind diese nicht falsch [5]] (Ordinaalgetal) ist kein Template, aber hat etwas mit Mathematik zu tun. Grüss. --Algont 22:44, 7. Sep. 2009 (CEST)Beantworten

fixed. dann einfach <nowiki>-tags drumsetzen. (oder auch <math>-tags, wenns passt) --xAwOc 22:58, 7. Sep. 2009 (CEST)Beantworten
Besser wäre <math></math>. -- sk 08:36, 8. Sep. 2009 (CEST)Beantworten

Bot-readable updates?

I notice that pages like [6] and [7] haven't been getting updated recently. It would be really nice if those could be updated, in addition to having the new interface, because it's much easier to use a bot when there is a plain-text list of articles to copy. Thanks! -Drilnoth (Talk) 04:54, 9. Sep. 2009 (CEST)Beantworten

Problem in frwiki also, all projects seem to have been updated today, but with the scans results made on monday. The new interface is not accessible anymore since yesterday night... -- - Richard ⇔ 10:24, 9. Sep. 2009 (CEST)
At the toolserver was a problem with the SQL-Server. This problem was fixed, but now the backup will be implemented. See this mail. - To the problem with the error-lists: This is a bigger problem. At the moment I am happy that the new interface is running very well and the user use this. Also at the moment only the errors from the live wikipedia (and not from a dump) is inside the database. Only new articles and last changes will be scanned and insert into the database. This is also the reason for the low numbers of errors in the new interface. Maybe in dewiki only 14000 or so. In the dump the script find over 100000 errors. In the next days I will create a picture about the processes so that everyone understand the details. The biggest problem is after a dumpscan sometime over 300000 articles must be scanned in the live wikipedia and this is too much. - For all user with bot I have implement a output list in the new interface, also the function to set all articles as done. I hope this help. -- sk 10:51, 9. Sep. 2009 (CEST)Beantworten
Next problem see this mail. -- sk 22:26, 9. Sep. 2009 (CEST)Beantworten
Wonderful, it works. -- sk 13:48, 10. Sep. 2009 (CEST)Beantworten

Table max width (new interface)

When the table is larger than screen (my screen is only 19') it's not usefull. All "Done button" are not displayed", you must use the horitonal bar... IF you have 10, 15, 20 times to do this ("done"->then H-bar, "done"->then H-bar, "done->then H-bar, etc...) :-(

To see the rendering of the problem, it may depends on your screen width. Example (hoping it's width enough on your screen), but looking at this, it seems to already have a maximum width, no ?

No way for it to be based on OS screen resolution for example ? (i don't know if it's easy to code... !) -- - Richard ⇔ 16:31, 10. Sep. 2009 (CEST)

Hello Richard, the problem is most one article with a big nobreakable notice. For example "{{Löschantragstext|tag=4|monat=September|jahr=2009|titel=Fachverband…". If you have done this one then you have a smaller table. -- sk 20:52, 10. Sep. 2009 (CEST)Beantworten
Yes, with "a big nobreakable notice"... Indeed, when there is one, it's not a problem, only when there is a big amount... OK... it was only a suggest... -- - Richard ⇔ 22:01, 10. Sep. 2009 (CEST)