Jump to content

User talk:GreenC/WaybackMedic 2.5

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
[edit]

PLEASE HAVE ANOTHER LOOK AT THIS ONE—I find that link is still dead. Cheers, Bjenks (talk) 15:19, 28 August 2020 (UTC)[reply]

Bjenks. The purpose of {{dead link}} is to flag when a link does not have a web archive URL. Once it has a web archive URL the {{dead link}} template is removed. It is redundant to have {{dead link}} and archive URL. -- GreenC 15:23, 28 August 2020 (UTC)[reply]

Look at this

[edit]

Is this still a problem ( https://en.wikipedia.org/w/index.php?title=Bookmarklet&type=revision&diff=904183172&oldid=901585122 https://en.wikipedia.org/w/index.php?title=Date_format_by_country&diff=next&oldid=805456307 ). I have done a bunch of nobots removal (for dead bots, and issues that are now fixed), and this one stood out. AManWithNoPlan (talk) 20:28, 15 May 2021 (UTC)[reply]

Great, AManWithNoPlan, glad someone is checking these!
In Bookmarklet javascript:location.href='https://web.archive.org/save/'+document.location.href; is causing trouble as it is trying to convert the /save/ URL to a proper archive URL which breaks with the +document as a "path". Given the nature of the article I decided to bypass it entirely. There's no other way such as {{cbignore}}. I don't keep a blacklist (skiplist) though that might be more polite than nobots, for my own bot.
For the other I can't tell at the moment why it is nobots, but seeing a lot of problems in the citations Medic would normally fix. It's possible there are severe timeout delays at the remote sites exceeding the ~ 4 hour limit to complete. Trying in expedited debug mode. -- GreenC 21:37, 15 May 2021 (UTC)[reply]
I guess just check https://en.wikipedia.org/wiki/User:AnomieBOT/Nobots_Hall_of_Shame from time to time. I found lots of UNreported bugs in a bot I work on that way. AManWithNoPlan (talk) 13:34, 16 May 2021 (UTC)[reply]

Manual option

[edit]

GreenC, is there an option or a Toolforge process to manually run WaybackMedic or a flag that can be placed within an article to invite the bot for a visit and be included in its next run? Thanks! — WILDSTARtalk 16:19, 7 November 2021 (UTC)[reply]

@WildStar: There is not, sorry. If you want to save dead links you can run IABot on the page (history tab->fix dead links). If it something you think WaybackMedic is best for let me know the page name and I'll run it. -- GreenC 17:08, 7 November 2021 (UTC)[reply]

adding in protocol of pages

[edit]

could this get moved to the 'cosmetic' section? https://web.archive.org/web/20110205011118/http://users.utu.fi/mjranta/reprints/1.%20Rantala1999.pdf and https://web.archive.org/web/20110205011118/users.utu.fi/mjranta/reprints/1.%20Rantala1999.pdf go to the same place. Arlo James Barnes 08:20, 19 May 2022 (UTC)[reply]

Hi User:Arlo James Barnes. I believe it exists if this is what you mean Special:Diff/1088342443/1088614489 -- GreenC 14:21, 19 May 2022 (UTC)[reply]

01760655558 119.30.39.122 (talk) 09:16, 5 April 2023 (UTC)[reply]

Source code for WaybackMedic 2.5

[edit]

The GitHub repository only has source code for WaybackMedic versions 0, 1, 2, and 2.1. Where is the source code for WaybackMedic 2.5? Solomon Ucko (talk) 00:16, 22 September 2023 (UTC)[reply]

Curious about webcitation-to-archive.org conversion by bot

[edit]

Regarding this diff, which is described in relevant part as "Rescued 1 archive link," I am curious why a live link to webcitation.org is considered to be in need of "rescue" and conversion to archive.org instead. Is there a WM policy I missed favoring the use of archive.org over other archivers? Is there some concern about the long-term viability or availability of webcitation.org I should know about, deprecating its use? Or maybe the original link was down when the bot checked it, though it was live when I made the edit and when I checked just now? Much obliged for any insight. —KGF0 ( T | C ) 19:23, 22 September 2023 (UTC)[reply]

WebCite was dead for nearly a year and half with no indication it was ever coming back. There is also an RfC to deprecate it. -- GreenC 16:31, 23 September 2023 (UTC)[reply]

Soundtrack Geek

[edit]

Hi @GreenC: I noticed that your bot was able to tag a website as "usurped". I was wondering if you could do the same for http://www.soundtrackgeek.com/, which formerly hosted film soundtrack reviews but is now a website for adult content (content advisory!). There aren't very many incoming links, but could you tag those as |url-status=usurped as well? Thanks! InfiniteNexus (talk) 21:08, 1 February 2024 (UTC)[reply]

User:InfiniteNexus: I added it to the queue: Special:Diff/1198014184/1202023308 .. it might take a few months because I wait for domains to accumulate before processing at once is easier. I notice the link you gave shows a database error, but once usurped a site is at risk, so it will be good to do so. Thanks for the report. -- GreenC 21:59, 1 February 2024 (UTC)[reply]
Thanks. Interesting it now shows a database error; this wasn't the case a year ago. I don't know if the site will be back up. But old links, like http://www.soundtrackgeek.com/reviews/inception-soundtrack-review.php, still redirect to URLs with dirty words. InfiniteNexus (talk) 22:26, 1 February 2024 (UTC)[reply]
I'll make sure the archives are old since the newer archives appear infected. -- GreenC 01:46, 2 February 2024 (UTC)[reply]
Thanks. InfiniteNexus (talk) 18:18, 3 February 2024 (UTC)[reply]
IC Ronna lynn794 (talk) 19:05, 21 April 2024 (UTC)[reply]

FTP

[edit]

In 2021 your bot did this edit, changing several ftp: URLs into http. But the http ones don't exist. Can that article be fixed, and can the bot be fixed so it won't do that? Please Ping me. Eric Kvaalen (talk) 08:12, 22 April 2025 (UTC)[reply]

User:Eric Kvaalen:
  1. ftp://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/a_old_versions/de418_announcement.pdf
  2. http://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/a_old_versions/de418_announcement.pdf
The http works for me. What the bot did back in 2021 was a detailed program, it verified every http link was working, not a simple search/replace. It also checked if the FTP was working. There were multiple contingencies I can't remember the rules offhand. Your post is the first time anyone said something was wrong, and I can't see a problem.
Page status 200 normal. Returning Content-Type: application/pdf. Verified in a browser.

[/home/greenc] ./header https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/a_old_versions/de418_announcement.pdf
HTTP/1.1 200 OK
Date: Sun, 27 Apr 2025 18:12:10 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Frame-Options: SAMEORIGIN
Last-Modified: Wed, 02 Jan 2008 23:03:14 GMT
Accept-Ranges: bytes
Content-Length: 1973720
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: application/pdf

-- GreenC 18:24, 27 April 2025 (UTC)[reply]

  • I see now the one's with archive URLs are not working ideally. FTP is a long story, Some FTP servers have HTTP gateways, like in the above PDF example you can reach the same content via http protocol. After about 2020/2021, most web browsers no longer support the FTP protocol due to security issues, except through these https gateways. What is available through the gateway is determined by the website. In this case they have a gateway for the PDF file, but no gateway for directories like ftp://ssd.jpl.nasa.gov/pub/eph/planets/ .. this is not a web link and should not be presented as one ie. with square brackets or citation templates. The only way to access this link is through a dedicated FTP client like FileZilla or Cyberduck or unix ftp command. This link was probably added prior to 2020/2021, when you could still access ftp:// through a browser. Then that stopped working. I made attempts to fix them as best as possible with a bot run. Now the job is cleanup, and that will be up to the community how they want to handle these. Recommend when you find these, try to find a web page equivalent and replace it. If you can't find one, the FTP link will need to be presented in a way that makes it clear the situation ie. dedicated FTP client required, not supported in browsers by default, https gateway version might exist. We probably need a separate citation template eg. Template:Cite FTP -- GreenC 18:51, 27 April 2025 (UTC)[reply]
Opened a new discussion here: Help_talk:Citation_Style_1#TM:CIte_FTP -- GreenC 19:21, 27 April 2025 (UTC)[reply]