Jump to content

Wikipedia:WikiProject Red Link Recovery/Unlikely links

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Topbanana (talk | contribs) at 12:04, 1 February 2012 (Target page). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This page is for the discussion of the Unlikely Links tool, hosted on the toolserver at http://toolserver.org/~tb/unlikely/.


Ideas for future unlikeliness checks

  • Characters in the UTF-16 range may indicate corruption or untranslated foreign-language links
  • Anything that would trigger a rule from MediaWiki:Titleblacklist - if a page cannot be created for the target of a link, that link is suspect.
  • Mixes of language-specific characters - for example Icelandic and Romanian specific characters in the same red link
  • Badly formed template links

- TB (talk) 22:31, 15 December 2010 (UTC)[reply]

Common Double Letters

ii is a reasonably common double letter - skiing, Hawaii, various star names - perhaps it should be excluded from uncommon double letters. welsh (talk) 04:54, 24 December 2010 (UTC)[reply]

I've removed double i's for now. The letters in use were chosen by counting all instances of double lettes in article titles and selecting the least common 5. 'i' was indeed the most commonly present of the five selected. - TB (talk) 19:40, 24 December 2010 (UTC)[reply]

Namespaces

The tool does not display whether a page is in the Portal: or Template: space, but rather leaves it unmarked, which then defaults to Main:. It's easy to see what's going on by doing a what links here? on the redlink, but flagging the namespace would be better. welsh (talk) 14:11, 24 December 2010 (UTC)[reply]

Fixed. - TB (talk) 19:33, 24 December 2010 (UTC)[reply]
Thanks welsh (talk) 12:40, 8 January 2011 (UTC)[reply]

Slow

The suggestions from the tool are taking a long time to display - several minutes in some cases. Is there anything like a tweak to indexes that could fix this? For example, triple letters towards the end of the alphabet. welsh (talk) 12:40, 8 January 2011 (UTC)[reply]

Alas, a simple index won't do the job in this case. Currently, a list of all red links in the English-language wikipedia is maintained and searched on demand for any matching a particular patten (the patterns can be seen here). The list is too long to brute-force search quickly, and the patterns too varied to index effectively. The real solution is I suppose to store pre-calculated lists, as the RLRL tool does - however, in the longer-term, I'm hoping to transform the tool into a more generalised 'red-link explorer', hence it's simplistic design for now. I'll ponde the matter more - inspiration might strike yet ;) - TB (talk) 10:16, 13 January 2011 (UTC)[reply]
I've adjusted a few things to hopefully improve performance a bit. More to come. - TB (talk) 21:48, 27 May 2011 (UTC)[reply]
I noticed the refresh was faster even without knowing anything had been changed! Well done welsh (talk) 23:32, 27 May 2011 (UTC)[reply]

List rebuilt

Redlink list rebuilt, and a few tweaks made to the tool to make it deal more sensibly with large numbers of whitelisted entries. - TB (talk) 07:54, 26 April 2011 (UTC)[reply]

New pattern added - 'All uppercase'

New pattern added - 'All uppercase'. This shows red links that are ALL IN UPPER CASE, of course ;) - TB (talk) 17:36, 27 April 2011 (UTC)[reply]

Cool new set! Lots of whitelist candidates (ships, satellites, asteroids, international standards, radio stations...) but many positives too. welsh (talk) 06:57, 28 April 2011 (UTC)[reply]

New pattern added - 'Offensive words'

New pattern added - 'Offensive words'. This shows red links matching a small selection of offensive English-language words. - TB (talk) 21:21, 21 May 2011 (UTC)[reply]

Sorting lists

Sometimes, maybe just for variety or efficiency of editing, it would be good to see the lists sorted by Containing Article rather than bad link name. This would be particularly useful in the very long ALL UPPERCASE class. welsh (talk) 09:18, 22 May 2011 (UTC)[reply]

I quite agree - in general the facilities for navigating lists of unlikely links are pretty crude. I'll see if I can't graft on a more flexible set of tools, hopefully including the ability to sort and further filter lists. - TB (talk) 11:22, 22 May 2011 (UTC)[reply]

Which way forwards?

Okay, I've tried quite a few approaches to improving this tool can find nothing that satisfies me, so I'm soliciting input on what folks want. My original intention was that it develop into a 'red link explorer' tool, allowing users to flexibly generate lists of red links of interest, hopefully for the purpose of fixing them. It turns out that there are a couple of showstoppers making this infeasible:

  1. The way the MediaWiki database is structured makes it time consuming to generate a list of all red links (around 4 hours currently)
  2. Likewise, the database structure makes it very hard to maintain such a list - normally one could run through the hundreds of edits made each minute and add/remove red links to keep the list of all red links up to date. Not possible :(
  3. The list of red links is large enough that waving it past even a simple regular expression takes double-digits seconds. Running arbitrary user-generated queries is likely to be problem-prone.

So, a new vision is needed. Anyone ? - TB (talk) 20:33, 6 July 2011 (UTC)[reply]

New pattern added - 'Double disambiguation'

New pattern added - 'Double disambiguation'. This shows red links ending in two bracketed terms - for example 1906_Australasian_Championships_(tennis)_(tennis) - TB (talk) 14:46, 25 August 2011 (UTC)[reply]

Target page

How about looking for links to "Target page name"? You get those when you click on the "redirect" icon in the edit box and don't change the text. I've fixed a few of those a few times.ospalh (talk) 19:04, 20 September 2011 (UTC)[reply]

Hi Ospalh. Nice idea - that's a new one by me, I tend to not use the javascripty goodies. The list you're after can be found using the normal "What Links Here" tool. Thinking this over, there are a few other similar "error indicator links" we should probably be checking periodically also:
Can you think of any more ? - TB (talk) 19:42, 20 September 2011 (UTC)[reply]

New set: Sabha constituencies =

There are around 550 Lok Sabha constituencies, all of which AFAIK have pages. Spelling variations seem rife, so all redlinks in this set shout be fixable. - TB (talk) 12:04, 1 February 2012 (UTC)[reply]