Jump to content

Template talk:Infobox language/Archive 8

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Lowercase sigmabot III (talk | contribs) at 01:42, 19 September 2017 (Archiving 1 discussion(s) from Template talk:Infobox language) (bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Archive 5Archive 6Archive 7Archive 8Archive 9Archive 10Archive 11

RfC: What should the language infobox display when editors have not found any speaker figures?

The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the conclusions reached follows.
There is consensus to display nothing. The majority opinion is that displaying "no data" is confusing and has multiple possible reasons that could explain the wording, and that showing nothing shows there is no data. The proposal has no consensus as the participants in the RFC that replied are to evenly split on the proposal. AlbinoFerret 19:17, 29 December 2015 (UTC)

In the case that editors have looked for speaker figures, but have not found any, they can set the parameter |speakers= of Template:Infobox language to ?. This currently causes the infobox to display “Native speakers (no data)”. There are two questions:

  1. Should we display something in this case, or should we display nothing?
  2. If we should display something, then what should it say?

--mach 🙈🙉🙊 09:35, 29 November 2015 (UTC)

Survey

  • Don’t display anything. – When there is lack of information about an infobox item such as native speakers, then this item should be concealed, see MOS:INFOBOX#Causes of inconsistency. The infobox is no place to tell the readers that editors have looked for speaker figures, but have not found any. The current wording “(no data)” is unclear and ambiguous: It does not actually tell the reader that editors have looked for speaker figures, but have not found any; and we cannot possibly verify that there is “no data” – at best, we can verify that a particular source does not contain any data. --mach 🙈🙉🙊 09:35, 29 November 2015 (UTC)
  • Both Comment I haven't made up my mind but I have made up my mind, see my reasoning and proposed solution below. I do want to point out an important distinction. There are two reasons for lack of speaker data, one is that the data simply have not been collected, but the second is that the data exists but we editors haven't found it yet. I think we maybe should make that distinction, but I'm not sure how and whether it's actually too practical on the whole. Wugapodes (talk) 21:07, 29 November 2015 (UTC)
    Also, regardless of what conclusion we come to, I strongly oppose using footnotes like that one that was added a few days ago. It messed with a number of articles, including one I have at FAC, by including a note in a reference section. Hard-coding a note into a template leaves very little customizability for editors which is problematic for something as variable as reference and footnote formatting. Wugapodes (talk) 21:14, 29 November 2015 (UTC)
    How would you know the difference between a piece of information that hasn't been collected and one that you haven't found? Under what circumstances would we be able to provably, citably state that a figure has been collected, but not be able to state what that figure was? This "important distinction" would simply give a place for the editor's personal opinion of whether or not the information is out there somewhere - which doesn't sound very wiki to me.Chuntuk (talk) 16:37, 30 November 2015 (UTC)
    If a source states that there's no estimate, we can say that, otherwise we presume we as editors haven't found one (regardless of the reason we haven't). There are four possible situations: 1) An estimate exists and editors have found it in a reliable source, 2) an estimate does not exist and editors have found that statement in a reliable source, 3) an estimate exists and editors have not found it in a reliable source, 4) an estimate does not exists and editors have not found an estimate in reliable sources (because it isn't there). Those last two are functionally the same: there are then three knowledge conditions editors can have: we know an estimate (#1), we know we can't know an estimate yet (#2), and we don't know if we can know an estimate (#3 and 4). We need a way to distinguish between those last two. The last one is a default, we don't know what we don't know, but when we do know what we don't know (and it's stated so in reliable sources) we need a way to put that in the infobox. We can't conflate those two because it's an important distinction that some articles need to make. Wugapodes (talk) 23:31, 30 November 2015 (UTC)
    That's why whenever we place "No data" in the infobox there should either be a paragraph in the article or a footnote which describes the lack of data in reliable sources and records our efforts to find those data. That will give the ambitious editor a clear jumping off point if he/she wishes to try and find the data in some utterly obscure source rather than having to trek through the already examined places. --Taivo (talk) 00:32, 1 December 2015 (UTC)
    I disagree as I feel that 1) it can easily be construed as self referential and 2) it will lead to a lack of consistency which is the opposite of what we want in an infobox. Now that I've put more thought into it, I think that we should display "(no data)" in cases of #2 from above (reliable source says no estimates available) and display nothing otherwise. I'm going to come up with a proposal below. Wugapodes (talk) 02:33, 1 December 2015 (UTC)
  • Display the text "No data" and then manually add a footnote that indicates why there are no data. The footnote should not be "hard code", but customized for each language/dialect article where it appears. If there is already a section in the article that discusses the lack of speaker figures, then the text in the infobox should read "See section XXX". --Taivo (talk) 01:54, 30 November 2015 (UTC)
The reason we must display something is that this is one of the major pieces of information that readers will be looking for when they come to a language/dialect article. To just leave it blank violates principles of usability. It requires readers to waste their time looking for other sources when we have already looked and not found anything. If some editor in the future finds something in an obscure source written in Swahili that no one has seen before, this is not the beginning of the apocalypse. It just means that the new editor changes the entry in the infobox. There are no criminal charges filed by wikilawyers against past editors who looked, but failed to find any valid information in reliable sources. But we waste our readers' time by not providing them the key piece of information from the beginning--that we've already made a good faith effort to find the information, but it's not out there. --Taivo (talk) 22:37, 30 November 2015 (UTC)
There is no evidence that speaker figures are “one of the major pieces of information that readers will be looking for when they come to a language/dialect article” – that is just a POV by you and kwami. Not even the Ethnologue says anything about speaker figures when there are none (see [1]). --mach 🙈🙉🙊 10:06, 1 December 2015 (UTC)
It's too bad that what was a decent, well-mannered discussion had to be turned into your continued personal attack against me and Kwami. You have no evidence that speaker numbers are not one of the primary pieces of information that readers are looking for when they first investigate a language/dialect article. Whether Ethnologue displays that information or not is immaterial to the discussion here. --Taivo (talk) 13:54, 1 December 2015 (UTC)
I am merely pointing out that the burden of proof is always on the side of those who want to include an information. --mach 🙈🙉🙊 15:26, 1 December 2015 (UTC)
  • Don’t display anything. "No data" is a positive statement that no data exists anywhere any I'm not sure how often we can be confident that that is the case.©Geni (talk) 22:26, 30 November 2015 (UTC)
  • Display nothing, "Wikipedians haven't found the sources", "the research has not been done", and "it's impossible for the research to ever be done" are all three different cases that "no data" confuses. We don't use "no data"-kinds of meta-commentary for other fields in infoboxes. They're infoboxes, not "WeDunnoBoxes".  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  18:22, 1 December 2015 (UTC)
  • Display the text "No data". "No data" is ambiguous enough that it doesn't violate OR, despite Mach's insistence that is makes a specific claim. (It could simply mean no data is available to WP, and even if we were to find a source that says no data is available, that only means no data is available to them, not that there is no data at all.) If an editor likes, an explanatory note can be added via the ref= field. — kwami (talk) 19:54, 1 December 2015 (UTC)
What I am insisting on is precisely that a mere “no data” is too ambiguous. --mach 🙈🙉🙊 12:19, 6 December 2015 (UTC)
We don't want to make it more precise because we can't count on every article being the same. I think "no data" is just fine. You object that "no data" is too ambiguous, but also that we can't make it unambiguous because that's OR, so your only solution is to hide our ignorance from the reader, which is at best a disservice and at worst dishonest. — kwami (talk) 00:15, 13 December 2015 (UTC)
I am not saying that we can’t make it [“no data”] unambiguous because that’s OR – what I am saying is that it is too ambiguous: It can be read as a disclosure of our ignorance or it can be read as a verifiable claim. --mach 🙈🙉🙊 11:16, 13 December 2015 (UTC)
  • Don't display anything (as the Ethnologue does in such cases). Later editors may be helped by a record of the sources in which no information was found in a comment or (probably better) on the talk page, but this is editorial data that does not belong in the visible text and would be unhelpful clutter in the infobox. Kanguole 11:46, 6 December 2015 (UTC)
  • Don't display anything if we can't find a reliable source for the estimate or for the fact that there is no estimate. Display "no estimate exists" if there is a reliable source for that. Don't use "no data" because readers won't know if that means there isn't any data anywhere or just that the authors of the article didn't have any. Bryan Henderson (giraffedata) (talk) 03:43, 14 December 2015 (UTC)
That's not a relevant distinction. There's generally no way to know if there's no data anywhere. All we can know is what is in our sources, and "no data" sums that up adequately. — kwami (talk) 01:56, 17 December 2015 (UTC)
I disagree. The wording “no data” only adequately sums up a source if the source affirms that there is no data. It does not adequately explain to the reader that editors have looked for sources, but have not found any. --mach 🙈🙉🙊 09:54, 26 December 2015 (UTC)
  • Don't display anything I can't see any way that "no data" is less ambiguous than not showing anything. The only time I can see this being what should be written is when a reliable source has stated "there is no data about speaker figures". If we simply can't find a source we shouldn't make any statement about the availability of the information. Sam Walton (talk) 13:18, 29 December 2015 (UTC)
  • Don't display anything Lack of information is expresed by ... (surprise!) no information is displayed. The are virtually infinite number of places in an excyclopedia, especially in a constantly edited excylopedia, with lack of information. The only real and practicable solution is to leave it off. Otherwise you would have to read all the time "no data found", "we had no time to add the lack of information", "we know, there must be more, but it is still missing", and so forth. That would be substantially ridiculous. -- ZH8000 (talk) 14:40, 29 December 2015 (UTC)

Proposal by Wugapodes

Going off my comment above about the distinction between a reliable source saying no estimates are available and we just not having any, I propose the following:

  • When a reliable source states there's no estimate available, we display (No Data) as that is a statement we would have evidence for. Otherwise the space is left blank and nothing is displayed.
  • We would have "?" produce "(No Data)" so that current implementations aren't broken.
  • We would include in the template documentation the distinction; something along the lines of If you don't have an estimate, leave the parameter blank unless a reliable source says no estimate is available. In that case, use "?".
  • We add a "|ref=" parameter that would append a citation for the claim of no data.
  • We would have a maintenance category for any article that uses "?" without "|ref=" so they can be fixed by editors at their leisure.

Wugapodes (talk) 02:33, 1 December 2015 (UTC)

That seems quite reasonable to me. I think, though, that we do not need the special case where the paramater |speakers= is set to ?. When there is a reliable source that says there is no data, it is simpler and clearer that this be encoded as |speakers=no data and |ref={footnote text linking to reliable source}. Therefore, I would rather remove the special ? case after it has been removed from the articles (using the maintenance category you are proposing). There are probably little more than a hundred articles that use |speakers=?, see the tentative at User:J. 'mach' wust/sandbox#2015-11-24 list of pages that have speakers=? in the language infobox, and most of them are rather stubs. --mach 🙈🙉🙊 10:17, 1 December 2015 (UTC)
This proposal seems to have no overlap with how |speakers=? is used now (to record that a search of the literature has not turned up a figure). Those instances that also use the |ref= field either give an explanatory note (e.g. Sekele language) or link to a single source that does not contain a speaker figure (e.g. Akuku language). I don't know of any that cite a source saying no estimates are available. Such would be rare, as when people are interested enough to document the problems with getting figures they typically discuss the limits of the uncertainty (e.g. Sentinelese language). Certainly it would be rare enough to be best handled at individual articles rather than the coding of the infobox template. Kanguole 18:42, 1 December 2015 (UTC)
I agree with Kanguole that this isn't useful. — kwami (talk) 19:56, 1 December 2015 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Implementation of RfC consensus

I have now edited the code so the information no longer shows up in the infobox as per the above RfC consensus [2]. The articles with |speakers=? can be tracked in Category:Language articles with speakers set to 'unknown' (a maintenance category that apparently existed all along). I have added a note about that in the template documentation [3].

The articles with |speakers=? that have a link to a source in |ref= can now be looked up in the new maintenance category[4] Category:Language articles with speakers set to 'unknown' despite a reference. I have looked up whether any of them pointed to a source that affirmed the lack of data. None did, so I removed the |ref= parameter. I have left the |speakers=?, so these articles are now listed with the others in Category:Language articles with speakers set to 'unknown', and I added a hidden comment as to which source had been consulted already. --mach 🙈🙉🙊 04:30, 30 December 2015 (UTC)

@kwami: I have two issues with your recent edit [5]:
  1. You are saying that it “messes up tracking cats; all articles now tagged as lacking refs”. That is not true. Only those articles are tagged as lacking refs that have |speakers=? and at the same time |ref=⟨some ref⟩ – you probably have overlooked that I have reordered the code [6] so the tracking cats are not messed up and not all articles with |ref=⟨some ref⟩ are tagged. I have merely introduced a new tracking category – Category:Language articles with speakers set to 'unknown' despite a reference – for keeping track of exactly these articles. With your edit, keeping track of these articles is extremely difficult. Is there any reason why you want us not to keep track of these articles?
  2. Your reintroducing the “(no data)” wording in hidden comments makes no sense and is contrary to the consensus of this RfC.
I have reverted it in the spirit of WP:BRD. Now is the time for discussion. --mach 🙈🙉🙊 06:01, 31 December 2015 (UTC)
I am sorry – I had my mind all set to Category:Language articles with speakers set to 'unknown' despite a reference while you were referring to Category:Language articles without reference field. I have now fixed the latter category [7].
The current code does not only introduce the new tracking category Category:Language articles with speakers set to 'unknown' despite a reference. It also keeps better track of articles that have |speakers=none in Category:Language articles with speakers set to 'none'. Previously, these articles were only tracked if they had no |ref=. Now, they are all tracked (incuding articles such as Modern Standard Arabic that were not tracked previously). --mach 🙈🙉🙊 07:40, 31 December 2015 (UTC)

RfC: What should we do now that Ethnologue has put up a paywall?

On December 1, 2015 Ethnologue announced a new paywall. Users will be allowed to access 7 pages per month free of charge. To access more than this will require a subscription of $9.95/month or $60/year. The Infobox currently links directly to Ethnologue in several places by way of ISO3 codes. Given that Ethnologue no longer allows free access, linking to this source would seem to violate the spirit of Wikipedia. What do people think about changing the Infobox to link to an alternate but open source, such as Glottlog? Gholton (talk) 22:46, 21 December 2015 (UTC)

The codes link to ISO639-3 pages that are on sil.org (which is the registration authority), but presumably will remain freely available. However, these pages are very sparse, and rely on a link to the corresponding Ethnologue page for the denotation of the code. So ISO639-3 codes have become less useful.
Ethnologue is used in many instances of this infobox as a source for speaker population figures, but that use (like the similar use of Nationalencyklopedin) seems compliant with WP:PAYWALL. Kanguole 23:20, 21 December 2015 (UTC)
The Ethnologue or Glottolog are tertiary sources. Instead of pointing to them, we can always point to the individual secondary source they are based upon. --mach 🙈🙉🙊 23:38, 21 December 2015 (UTC)
Actually, Ethnologue often refers to primary sources, and for many languages is a primary source itself. Verifying the sources it does use is always a good idea, but that's more work than we're likely to be able to handle for more than a small fraction of the world's languages. — kwami (talk) 00:17, 22 December 2015 (UTC)
I think Ethnologue is a valuable resource (or, as Kwami alludes to, a collection of valuable resources). Our current usage seems compliant with WP:PAYWALL to me. And, for the casual WP reader, seven pages per month is probably adequate, although, for an editor without a subscription, it can be frustrating (even if there are easy ways around it).--William Thweatt TalkContribs 02:39, 22 December 2015 (UTC)
(ec) As long as our links are to the ISO, then they remain free. And linking to a pay site is not unheard of in Wikipedia, e.g., all those links to Britannica or to media sites that are behind firewalls. The user must decide if verifying the Wikipedia information is important enough to pay for it. And the information on Ethnologue and Glottolog are not equivalent. For example, Glottolog doesn't include speaker numbers and the maps are not really useful other than locating the center of mass of a speech community. It's great strength is bibliographical. --Taivo (talk) 02:43, 22 December 2015 (UTC)
I agree with what Taivo said. Peter238 (talk) 14:45, 22 December 2015 (UTC)
There is no link to Ethnologue in the infobox, User:ZH8000. The link is to the ISO 639-3 standard, which is not behind the paywall. And since the ISO 639-3 code is used far beyond Ethnologue, it is important to include. So your comment is irrelevant. --Taivo (talk) 16:06, 29 December 2015 (UTC)
What about providing a separate link in the Infobox to the Endangered Languages Catalog (ELCat)? Gholton (talk) 20:12, 22 January 2016 (UTC)
  • I hadn't seen this discussion and inadvertently started a parallel one at WP:LANGUAGES. I think that we should deprecate the use of Ethnologue data, both because they will probably eventually grow tired that we provide a substantial part of their product for free, and also because when it is not free, it actually makes more sense to use high quality paper sources in all the cases where it is available. I don't think we should remove all links to Ethnologue at one, but that we should aim to gradually replace them with higher quality specialist sources. This could be done by simply making a policy saying that other sources are prefered when available.·maunus · snunɐɯ· 00:48, 29 January 2016 (UTC)

Should there be a mention of the basic typology (SVO, VSO, etc) in the infobox? Jimw338 (talk) 17:55, 17 January 2016 (UTC)

Nope. That particular aspect of typology is not more basic than many other aspects - and often a question of interpretation.·maunus · snunɐɯ· 01:24, 29 January 2016 (UTC)

Displaying ISO code of a language for its dialects

I have started a discussion about this template at [[8]]. I started it on the WP:LANG talk page because more people read that page. I am proposing that we add a mechanism for indicating the ISO code that should be used for a dialect of a language, where the dialect does not have its own code. See discussion there. AlbertBickford (talk) 19:06, 10 March 2016 (UTC)

Linguasphere comment

Urdu currently has a comment embedded in its infobox's lingua field. Unfortunately this gets formatted in typewriter face (I hacked around it by manually adding <code>...</code> tags). I note that there are "comment" fields for all the language-code entries, except Linguasphere. Could one please be added to Linguasphere so this comment doesn't look ugly? Hairy Dude (talk) 01:11, 27 March 2016 (UTC)

Hairy Dude, yes, fixed here. Frietjes (talk) 14:45, 24 July 2016 (UTC)
Thanks. Hairy Dude (talk) 15:11, 24 July 2016 (UTC)

Catalogue of Endangered Languages

I wanted raise the possibility of adding a link to ELCat, akin to the way the infobox now links to Ethnologue and Glottolog. ELCat makes use of ISO codes, so it would be easy to pass code tcb to http://www.endangeredlanguages.com/lang/tcb. The only catch is that ELCat does not include all languages, only those considered endangered, so we'd need to create behavior to not pass codes which are not in ELCat (or else ask ELCat to more robustly handle nonexistent codes -- right now they give 404's). I think a link in the Infobox could generate lots of useful cross-fertilization between ELCat and Wikipedia. ELCat users are often Native speakers uploading content, and they could contribute productively to Wikipedia as well. Gholton (talk) 23:28, 28 September 2016 (UTC)

Rename "Altaic" in familycolor

It's been six years since I mentioned this would happen, but the fact that the familycolor attribute for a number of mainly Asian languages is "Altaic", a very heavily dismissed and refuted language family in modern linguistics with almost no proponents, is causing large edit wars on Japanese-related and Korean-related articles even if it's just supposed to be "areal". Please consider renaming this colour to anything else. Even a literal value of "arealaltaic" would be much better than "altaic" alone. But given that you have "American", "Australian", "Caucasian" and "Papuan" as values, then "Asian" or "Eurasian" would be good, geographically equivalent descriptors. — Io Katai ᵀᵃˡᵏ 06:13, 14 February 2017 (UTC)

It would be maybe good to create a unique Koreanic-Japonic-Ainu family colour/areal-family. Because Korean and Japanese may be no related, but in term of areal location and historicall connections it would make sense. This would also end the dispute/edit-war.

So the areal-Altaic family would be Turkic-Mongolian-Tungusic. And the new Koreo-Japonic(suggested name) family would be Koreanic-Japonic-Ainu. Ovilava (talk) 07:58, 14 February 2017 (UTC)

Except that Ainu has nothing to do with Altaic, which would mean that we'd be coding Korean as Japanese, which given the colonial history would be a disaster.
Whether or not the Altaic languages are actually related, they are typologically very similar. What we call the color is irrelevant to he reader, because they don't see it, but the term in the lit is "Altaic". If we call it Asian or Eurasian, then we'll get edit wars over people adding e.g. English because it's a Eurasian language because they don't understand that we're censoring ourselves to placate idiots.
I'd say leave it as is, or break it up and create three or four new family colors. The problem with the latter is that we've gotten to the point where it's difficult to distinguish the family colors that we have.
No matter what we do, we're going to get occasional IP edit wars. That's the nature of Wikipedia. My advise is to do what we always do in such situations: revert them, block them, or protect the articles. — kwami (talk) 02:55, 21 March 2017 (UTC)

Actually the altaic theory is obsolete. Typologicall similarities are no proove for a relationship. And mostly all modern linguists agree that macro-altaic is debunked and core-altaic is only a areal family. A koreanic-japonic-ainu areal family do not support imperial agendas. Koreanic would not be under japonic. They would be two independent members. Wikipedia should be actuall.

If you say typologicall similar, korean must also be a dravidian language. 213.162.68.217 (talk) 07:34, 22 March 2017 (UTC)