Module talk:Webarchive/sandbox
Comments
Hello @Trappist the monk:,
In function serviceName() it strips the hostname assuming "www" or "web", but there is a large variety as documented in wp:List of web archives on Wikipedia (in the "Hostname" field). This list is not complete and can change by remote providers (all unannounced and undocumented of course). I used mw.ustring.find() to check for what it includes rather than what to exclude.
It's not uncommon for timestamp years to range from 1890s to 2100. This is due to many factors mostly bot bugs and remote archive bugs. These archive URLs will often work despite not being literally accurate times. Also timestamps with a month of "15" etc, that are nonsensical, they in fact work on Wayback - it's a bug in their API that produces these timestamps. They end up redirecting to a sane timestamp and my bot WaybackMedic detects and fix them when it runs across them (not easy as there are about 5 different redirect types on Wayback including Javascript) -- so ideally the template would still render the archive as intended, assuming good faith it is a working archive, but also leave a tracking category warning entry for bots to cleanup. -- GreenC 05:29, 2 September 2018 (UTC)