Module talk:Webarchive
Perma.cc support
I added support for the Perma.cc web archiver. In the process I also added the ability to call the webarchive function with arguments (i.e. as {{#invoke:webarchive|webarchive|url=http://perma.cc/F9NT-22AK|date=2015-04-09}}
in addition to the preexisting {{#invoke:webarchive|webarchive}}
(which grabs arguments from the parent). This makes the module easier to invoke from Scribunto test cases, which I added at Module:Webarchive/testcases and Module talk:Webarchive/testcases. —RP88 (talk) 18:26, 15 November 2016 (UTC)
"at" versus "on"
Re: this[1]
If the rendered text said "Archived on Wayback Machine" I could see this, but most of the time it says "Archived August 3, 2015 at Wayback Machine" which is a shorthand way of saying "Archived on August 3, 2015 at the Wayback Machine". If we said "Archived on August 3, 2015 on the Wayback Machine" it's a repetitive use of "on" and it doesn't sound right. What is the "on Wayback Machine" shorthand for in this context .. It's confusing. "at the Wayback Machine" is unambiguous and without repetition. -- GreenC 14:28, 23 May 2017 (UTC)
- Hi, Green Cardamom
- "
it's a repetitive use of "on" and it doesn't sound right.
" Oh, yes, the cursed "it doesn't sound right" which I heard for years from students wrote incorrect answers in the exam because the correct answer didn't sound right. That's why I generally don't care if things sound right or not; I apply the tried and proven principle. - Maybe it doesn't sound right, but it nonetheless is right. Contents are hosted on websites, not at them. It is a simple matter of collocation.
- Also, you reverted bad insertions of "the", against which MOS:COMPUTING has warned. I bet they sounded right to you.
- Best regards,
Codename Lisa (talk) 15:13, 23 May 2017 (UTC)- You're going to need to get consensus for these changes. Thanks. -- GreenC 16:02, 23 May 2017 (UTC)
- MOS:COMPUTING, in the role of guideline, represents consensus. —Codename Lisa (talk) 05:01, 24 May 2017 (UTC)
- There is consensus this template has been in use for nearly a decade in its original form and wording at template:wayback and forcing through a change in style across 100s of thousands of articles is going to need consensus. There are issues to consider such as how this change will impact existing citations and how they are worded in-context. You can use the MOS as your argument but it's not hard policy. -- GreenC 13:01, 24 May 2017 (UTC)
- MOS:COMPUTING, in the role of guideline, represents consensus. —Codename Lisa (talk) 05:01, 24 May 2017 (UTC)
- You're going to need to get consensus for these changes. Thanks. -- GreenC 16:02, 23 May 2017 (UTC)
Off-topic
|
---|
|
- @Green Cardamom: Good morning. I am in a very good mood and ready to compromise.
Let's say it is controversial. Let's say my initial edit was wrong. Let's say my reinstatement of the edit was wrong. Let's say, as you reiterated many many times, a discussion is needed. Very well. Let's discuss.
- Let me cut to the chase. You already stated your concern: It was like this for ten years. Per WP:SILENCE, being like this for ten years constitutes consensus. If the community didn't like it, the community would have changed it. Or would it? Let's see it from several angels:
- A template, for ten years, has misspelled "fast car" as "fart car". Would you say there is a consensus in favor of the latter and it must be kept?
- Where those templates edit-protected? (The answer is "yes". It has been edit-protected for nine years, since 2008.) If yes, the community's lack of action was out of inability not consent. They probably saw it, wanted to change it, met with the block, and aborted because they didn't want to go through our faulty and unpleasant consensus-building process.
- Did the ... [Abrupt stop.]
- I am literally paraphrasing the contents Wikipedia:Silence and consensus § Silence is the weakest form of consensus, so allow me not to write the third, fourth and fifth entry. (You can read them there.) The point is: Silence is the weakest form of consensus. And we have a much stronger form of consensus here that overrules this weakest form of consensus: a guideline. Guideline represent broad community consensus. Of course, they must be treated with common sense and occasional consideration of exceptions. If you have grounds for an exception based on common sense, I am all ears. Blow me away.
- Best regards,
16:49, 26 May 2017 (UTC)
Off-topic
|
---|
Also, please be careful when making significant changes to a production Module that is so heavily used. If there is any chance of a technical problem and/or disagreement. The correct way would be to update the sandbox and wait a day or two for comment then copy it into the live Module. There are performance and backlinks database consequences with reverts. -- GreenC 14:31, 23 May 2017 (UTC)
|
Hi, thanks for the reboot and patience over the weekend. I've read through the guidelines, considered your position and understand it. Below are some thoughts and comments. There are two issues, use of "the" and "or/at".
"The" or not "the"
- 1a. The guideline on "the" is pretty clear.
- 1b. The use of "the" in
{{webarchive}}
is somewhat non-standard compared to other external link templates. - 1c. The use of "the" for everything but "the Wayback Machine" was added by me recently and has no real history of use. It was added to remain consistent with "the Wayback Machine" during the template merger, not for any preference.
- 1d. Given this I don't see why we can't make a decision to remove "the", unless there other objections.
"at" vs "or"
- 2a. Wikipedia overwhelmingly uses "at" in external link templates. In Category:External link templates there are over 500 templates listed. I could not find a case of "on", but 100s of cases of "at" - I didn't look through them all based on the templates starting with "A" and "B"s extrapolated there are 100s. If needed I'll manually check each and build a chart showing the overwhelming majority are "at" and few if any are "on". But you can browse through and see for yourself.
- 2b. The
{{webarchive}}
is a merger of{{wayback}}
which goes back nearly a decade without any comment or concern about using "at", despite it having an active talk page with many requests for changes over the years. It's used in over 160,000 articles but probably 2x-3x as many instances. It's used in external link sections, inline citations, inboxes and talk pages. The issue you raised of the template being protected doesn't seem too significant because editors have historically asked for changes -- Template talk:Wayback was deleted during the merger, but we can restore it. There are many cases there of editors asking for changes. - 2c. The guideline on collocation is for "strange forms of language". The use of "at" is commonly used and understood, as seen across 100s of external link templates (2a). We also use "at" in the CS1 template documentation, in a few places, and I'm sure many other places on Wikipedia. It's not "strange" wording, evidently. The application of this guideline is not strong.
- 2d. Given the tradition of using "at" in external links templates it would be inconsistent to have one template say "at", followed by another that says "or". For example:
==External links== Works at YouTube Works on Wayback Machine Works at New York Times
- We try to keep wording consistent across templates.
Comments
Over the years, the "at" standard emerged as the unspoken but de facto for external link templates on Wikipedia. None of this means it can't be changed. But it probably means we need wider community discussion before making a change given the impacts. The best way I know of is an RfC. It's a simple yes or no question that given the opportunity, many editors would like to participate. The RfC may close in favor of "on" but it could also close in favor of changing all templates because of the consistency issue in 2d. Thus those templates should also be notified about a consensus discussion that could impact them.
Alternatively we can compromise and keep "at" and remove "the" and let sleeping dogs lie. Also I'm not sure how you ended up here originally but there are other ways to customize the display output, for example if you want a very abbreviated version for use in infoboxes, for example, we can do it using the |format=
switch - it could eliminate both "at" and "or" entirely.
(If possible please post replies below reference the section numbers instead of mixing threads above inline. Also please take as much time you need) -- GreenC 14:10, 30 May 2017 (UTC)
- Greetings, Green Cardamom
- I normally don't use a greeting more formal than "hey", but this time, I did it to convey that I am proud that both you and Codename Lisa eventually proved that you are worthy of admins' trust in you for holding this privilege. We will gradually forget the initial stages of this conflict.
- So, I am proposing a compromise: You said
I don't see why we can't make a decision to remove "the"
. I suggest you two perform this action and instead leave "at" to be. In addition, through a gentlemen's agreement, refrain from further changing of "at" to "on" in similar templates until such time that you two can conduct a full-scale unification attempt. (I am not asking you to revert anything either of your two might have done in the past, just stop doing it.) - Alright, Green Cardamom and Codename Lisa, if you two have an accord, I would like both of you to write * '''Support compromise''' and sign it, to form an iron-clad consensus.
- FleetCommand (Speak your mind!) 14:38, 30 May 2017 (UTC)
- Support compromise. Why not. You know, Green Cardamom, ever since FleetCommand posted this message of his with the bloated salutation, I couldn't stop thinking about the Spiderman film in which the Spiderman says "Shut up, noisy kid. Let mommy and daddy have a talk!" But maybe he is right. A little formality can get a long way. Of course, if you don't support, we can still talk about it. —Best regards, Codename Lisa (talk) 08:15, 31 May 2017 (UTC)
- Ha, ha. Very funny. Judge all you want. I only meant to be appreciative, not bossy. FleetCommand (Speak your mind!) 10:03, 31 May 2017 (UTC)
- Yes I agree that is best way. Please go ahead, I don't know what should retain "the". I know Wikipedia is making an effort/priority towards civility and conflict resolution (building new tools etc), so I see Fleet Command as part of the movement in that direction and don't take it personally. -- GreenC 13:37, 31 May 2017 (UTC)
Bad bot edits
Please add a tracking category so that we can locate bad edits like this. --Redrose64 🌹 (talk) 22:20, 18 July 2017 (UTC)
- "insource:/\_\_FORMAT/" shows 344 articles affected. I can provide the list of names if you want. -- GreenC 23:03, 18 July 2017 (UTC)
- Considering the amount of edits made by the bot, it's better than I thought. I left a note at User talk:Cyberpower678#InternetArchiveBot: strange FORMAT parameter which you have clearly seen now.
Thank you --Redrose64 🌹 (talk) 08:03, 19 July 2017 (UTC)
- I've cleaned up the last sixty-odd articles affected. One was very strange, I felt it best to revert the whole ref back a few years and let the bot try again.. --Redrose64 🌹 (talk) 14:52, 9 September 2018 (UTC)
- Considering the amount of edits made by the bot, it's better than I thought. I left a note at User talk:Cyberpower678#InternetArchiveBot: strange FORMAT parameter which you have clearly seen now.
Nil host
Recent edits have caused a script error at Milkovich v. Lorain Journal Co.#External links. A demonstration follows:
{{webarchive |url=webarchive.loc.gov/all/20041221152648/http://www.firstamendmentcenter.org/faclibrary/case.aspx?id=1472 |title=Example |date=2004-12-21 }}
→ Error in Webarchive template: Invalid URL.
The problem is that line 398 includes:
.. ulx.url1.host
but host
is not defined in ulx.url1
. That is presumably due to mw.uri.new
deciding that the URL had no host.
By the way, there are several global variables in the code that would ideally be refactored to remove them. I don't have time to offer more than these comments at the moment, but in case anyone wants to look, these globals are probably ok (but should be refactored):
- args fulldate maxurls track ulx
while these globals are probably mistakes:
- cday plain
Johnuniq (talk) 00:54, 3 June 2018 (UTC)
- The article was just fixed (diff) by inserting
http://
in front of the URL. Nevertheless, the module should do something different if http is missing. Johnuniq (talk) 00:59, 3 June 2018 (UTC)- cday and fulldate are errors, fixed. cday error didn't show up as no one uses the YDM date format. plain is also an error, it should be a boolean 'true' not a flag called 'plain' (I was misreading the manual) - no harm done it didn't matter as it was using "magic" characters anyway (that also fixed). args maxurls track and ulx are global, it increased code complexity to pass around as arguments so I kept them global even though it's not ideal, this isn't a large program that requires data isolation so it's a tradeoff. If someone really finds it a problem they can refactor. The protocol scheme is a problem, I think the current solution will solve most scenarios including protocol-relative and mixed-case. It doesn't check for other protocols like ftp as I've never seen them as archive URLs. If there is a better idea have at it. -- GreenC 03:00, 3 June 2018 (UTC)
- That's good thanks. Johnuniq (talk) 05:05, 3 June 2018 (UTC)
- cday and fulldate are errors, fixed. cday error didn't show up as no one uses the YDM date format. plain is also an error, it should be a boolean 'true' not a flag called 'plain' (I was misreading the manual) - no harm done it didn't matter as it was using "magic" characters anyway (that also fixed). args maxurls track and ulx are global, it increased code complexity to pass around as arguments so I kept them global even though it's not ideal, this isn't a large program that requires data isolation so it's a tradeoff. If someone really finds it a problem they can refactor. The protocol scheme is a problem, I think the current solution will solve most scenarios including protocol-relative and mixed-case. It doesn't check for other protocols like ftp as I've never seen them as archive URLs. If there is a better idea have at it. -- GreenC 03:00, 3 June 2018 (UTC)
Date glitch
I'm hoping someone will fix the following problem which is slightly simplified from here.
{{webarchive |url=https://web.archive.org/web/20070519171050/http://www.example.com|title=Example|date=19 May, 2007}}
→ Example at the Wayback Machine (archived 2007-05-19)
The date has a comma in it which results in df
being nil, and that causes formatDate
to crash (it currently shows "string expected, got nil"). By the way, the doc page has a red link "MediaWiki:Webarchive" at the top due to confusion regarding {{lm}}. I had a hand in that but I don't currently have the patience to sort it out. Johnuniq (talk) 09:06, 13 October 2018 (UTC)
- Fixed – failure to return a fallback date format when
decode_date()
couldn't make sense of what it was given. I have been wondering about that redlink. I'll delete it, shall I? - —Trappist the monk (talk) 11:15, 13 October 2018 (UTC)