Jump to content

Module talk:Webarchive/sandbox

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by GreenC (talk | contribs) at 05:30, 2 September 2018 (Created page with '==Comments== Hello {{yo|Trappist the monk}}, In function serviceName() it strips the hostname assuming "www" or "web", but there is a large variety as documente...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Comments

Hello @Trappist the monk:,

In function serviceName() it strips the hostname assuming "www" or "web", but there is a large variety as documented in wp:List of web archives on Wikipedia (in the "Hostname" field). This list is not complete and can change by remote providers (all unannounced and undocumented of course). I used mw.ustring.find() to check for what it includes rather than what to exclude.

It's not uncommon for timestamp years to range from 1890s to 2100. This is due to many factors mostly bot bugs and remote archive bugs. These archive URLs will often work despite not being literally accurate times. Also timestamps with a month of "15" etc, that are nonsensical, they in fact work on Wayback - it's a bug in their API that produces these timestamps. They end up redirecting to a sane timestamp and my bot WaybackMedic detects and fix them when it runs across them (not easy as there are about 5 different redirect types on Wayback including Javascript) -- so ideally the template would still render the archive as intended, assuming good faith it is a working archive, but also leave a tracking category warning entry for bots to cleanup. -- GreenC 05:29, 2 September 2018 (UTC)[reply]