Jump to content

User:BrownHairedGirl/Articles with probably fixable bare links

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by BrownHairedGirl (talk | contribs) at 22:04, 30 November 2021 (Lists: Blanked, to prevent accidental re-submission. The latest batch "britannica.com + tcd.ie" (287 pages) ​is currently being processed by the bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Note

This page exists solely as a list of articles for processing to cleanup references which use bare URLs (see WP:Bare URLs). I do that by feeding these lists to Citation bot.

The lists have no significance other than as selections for cleanup. In the vast majority of cases, these are articles in which I have no interest other than fixing bare URL references.

Bare URLs

Note that the definition of "bare URL" used here is narrow: ref tags which contain only the URL, optionally preceded or followed by spaces, and/or enclosed in square brackets [].

For example:

  • bare URL, with no spaces: <ref>https://www.example.com/foo</ref>
  • bare URL, with spaces: <ref> https://www.example2.com/foobar </ref>
  • bracketed URL, with no spaces: <ref>[https://www.example.com/foo]</ref>
  • bracketed URL, with spaces: <ref> [https://www.example2.com/foobar] </ref>

There are of course many other types of inadequately described citation. This exercise targets only the simplest, worst examples. However, when Citation bot processes a page, it can fix many other citation issues, so this exercise fixes more than just the targeted problem.

Selection

These lists consist of articles which have one or more bare URL refs (as defined above) to a website where Citation bot can usually fill the reference. Procedure:

  1. take one or more websites where Citation bot has shown that it can fill a bare URL
  2. Use AWB's "Wiki search" function to find pages with probably-bare links to that URL
  3. Scan that list to keep only pages which actually have a bare URL ref to that website, using AWB's "skip"/"Doesn't contain" function in pre-parse mode

For example, the first batch processed this way is for 3 website: eurosport.com + atptour.com + tennis.com

  • Search term insource:/\<ref[^\>]*\>\s*https?:\/\/(www\.)?(eurosport|atptour|tennis)\.com/i
  • Doesn't contain: <ref[^>]*?>\s*\[?\s*https?://(www\.)?(eurosport|atptour|tennis)\.com[^>< \|\[\]]+(?<!\.(txt|pdf|jpg|jpeg|png))\s*\]?\s*(\{\{Bare +URL +inline\s*(\|[^\}\{\>\<)]*)?\}\}\s*)?<\s*/\s*ref\b

Note that the searches above use regular expressions (regex). Don't try using this method unless you are comfortable using regex.

Updates

Please note that this page is updated with a new list after a batch has been started to be processed, and sometimes before it has finished processing. So if you have come to this page after it is mentioned in an edit summary, please note that the current version of this page may not be the one used for that series of edits. For earlier versions, see this page's history.

Lists

Blanked, to prevent accidental re-submission.
The latest batch "britannica.com + tcd.ie" (287 pages) ​is currently being processed by the bot, as of 22:04, 30 November 2021 (UTC)