Wikipedysta:SchlurcherBot
Wygląd
| This user account is a bot that uses C#, operated by Schlurcher (talk). It is not a sock puppet, but rather an automated or semi-automated account for making repetitive edits that would be extremely tedious to do manually.
Administrators: if this bot is malfunctioning or causing harm, please block it. |
|
Emergency bot shutoff button Administrators: Use this button if the bot is malfunctioning (direct link) |
Function overview: Convert links from http:// to https://
Programming language: C#
Source code available: Main C# script: commons:User:SchlurcherBot/LinkChecker
Function details: The link checking algorithm is as follows:
- The bot extracts all http-links from the parsed html code of a page
- It searches for all href elements and extracts the links
- It does not search the wikitext, and thus does not rely on any Regex
- This is also to avoid any problems with templates that modify links (like archiving templates)
- Links that are subsets of other links are filtered out to minimize search and replace errors
- The bot checks if the identified http-links also occur in the wikitext, otherwise they are skipped
- The bot checks if both the http-link and the corresponding https-link is accessible
- This step also uses a blacklist of domains that were previously identified as not accessible
- If both links redirect to the same page, the http-link will be replaced by the https-link (the link will not be changed to the redirect page, the original link path will be kept)
- If both Links are accessible and return a success code (2xx), it will be checked if the content is identical
- If the content is identical, and the link is directly to the host, then the http-link will be replaced by the https-link
- If the content is identical but not the host, it will be checked if the content is identical to the host link, only if the content is different, then the http-link will be replaced by the https-link
- This step is added as some hosts return the same content for all their pages (like most domain sellers, some news sites or pages in ongoing maintenance)
- If the content is not identical, it will be checked if the content is at least 99.9% identical (calculated via the en:Levenshtein distance)
- This step is added as most homepages use dynamic IDs for certain elements, like for ad containers to circumvent Ad Blockers.
- If the content is at least 99.9% identical, the same host check as before will be performed.
- If any of the checked links fails (like Code 404), then nothing will happen.
Source for pages: The bot works on the list of pages identified through the external links SQL dump. The list was scrambled to ensure that subsequent edits are not clustered from a specific area.
Further comments: The bot respects the API:Etiquette and uses both a user-agent header as well as respects the maxlag parameter.
Edit page statistic:
| Project | Edites pages |
|---|---|
| commons | 6'014'443 |
| dewiki | 95'432 |
| enwiki | 93'122 |
| eswiki | 19 |
| frwiki | 20'453 |
| itwiki | 17'022 |
| plwiki | 25'323 |
| ptwiki | 30 |
- Date: 2025-10-22, Source: query/98360.
Status: (CentralAuth)
| Project | Request | Pages | Edit Description Used | Status |
|---|---|---|---|---|
| commons | Approved | 31'145'089 | Fix http to https | |
| dewiki | Approved | 1'888'381 | Bot: http → https | |
| enwiki | Approved | 8'570'327 | Bot: http → https | |
| eswiki | Pending | 2'191'542 | Bot: http → https | |
| frwiki | Approved | 2'970'187 | Bot: http → https | |
| itwiki | Approved | 2'359'233 | Bot: http → https | |
| jawiki | Allows global bots | 994'375 | Bot: http → https | |
| plwiki | Approved | 1'527'763 | Bot: http → https | |
| ptwiki | Pending | 1'214'889 | Bot: http → https | |
| ruwiki | Allows global bots | 1'797'992 | Bot: http → https | |
| zhwiki | Allows global bots | 1'105'051 | Bot: http → https |