Module talk:Lang/data

Module:Lang/data is permanently protected from editing because it is a heavily used or highly visible module. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit.

This is the talk page for discussing improvements to the Lang/data module.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1: 3 months

New subtable for proto-languages which should not have a splat by default

@Trappist the monk: Hi - I'd like to add a new subtable for protolanguages which are exceptions to the current rule that the splat (*) is added to the start of text by default, because this doesn't make sense in at least one (possibly two) cases:

Proto-Romance (roa-x-proto), which is to Vulgar Latin what Late Latin is to Classical Latin. It's attested, though often given with normalised spellings (as can be seen in its article), and is an unambiguous exception to the rule.
Potentially Proto-Norse (gmq-x-proto), which is the progenitor of the North Germanic languages. This one I'm less sure about, but it's sparsely attested in the Elder Futhark script, and going from Wiktionary's list of Proto-Norse lemmas (make of that what you will), only a third or so are unattested reconstructions. I would hold-off on this one for now, until I get a better feel for how often exceptions are needed.

In any event, I don't think it would be a good idea to deal with Proto-Romance as a special case in Module:Lang itself, so it would make more sense to add a subtable in this module which can be cross-checked:

local no_splat_proto_t = {
	["roa-x-proto"] = true,
}

no_splat_proto_t would obviously need to be returned in the main export table, and the proto_prefix function in Module:Lang would need a corresponding edit, but I could handle that as it's relatively trivial. 18:43, 4 August 2025 (UTC) Theknightwho (talk) 18:43, 4 August 2025 (UTC)[reply]

We have |proto= which accepts yes or no; is that not sufficient? When a 'proto' language is not a '*proto' language, is it proper to have a private-use tag that labels it as a '*proto' language?

Could we not create a tag roa-x-noproto instead? We don't support any 'proto' language names that aren't defined as such in Module:Lang/data so, instead of looking at the language name to see if we should mark the text with a splat, we can look at the language tag to see if it ends with -x-proto. Line 888 becomes sommat like:

local function proto_prefix (text, language_tag, proto_param)

and line 891 becomes sommat like:

elseif (language_tag:find ('%-x%-proto$') or (true == proto_param)) then

roa-x-noproto would fail that test (assuming proto_param is false) so no splat would be applied in the rendering. This would also allow for both:

["roa-x-noproto"] = "Proto-Romance",
["roa-x-proto"] = "Proto-Romance",

where the distinction between -x-noproto and -x-proto is obvious to editors who are reading the wikitext.

—Trappist the monk (talk) 19:50, 4 August 2025 (UTC)[reply]

@Trappist the monk To address your points:

I don't think it's the right place here to decide whether Proto-Romance or Proto-Norse is a protolanguage in the true sense, but that's what they're called (rightly or wrongly), and they're used within the framework of the comparative method, so for practical purposes that's what they are, whether or not they should be.
Certainly in the case of Proto-Romance, it shouldn't be treated as recontructed by default (i.e. it shouldn't automatically apply the splat), so it's an exception whichever way we look at it.
I don't think it's a good idea to use two codes, because:
1. This is an implementation issue, but having two codes would leave traces of that implementation issue in the metadata, making it appear as though there are two separate languages.
2. It becomes completely incoherent if someone uses roa-x-proto with |proto=no or roa-x-noproto with |proto=yes, because it's essentially introduced a second, special way to add a splat in this one case.
3. This would inevitably lead to confusion when someone tries to use -x-noproto with some other protolanguage that doesn't support it.
4. It bulks up the data unnecessarily.

Instead, line 891 could have something like:

elseif proto_param == true or not no_splat_proto_t[language_tag] and language_name:find ('^Proto%-') then

That just requires a simple exception table called no_splat_proto_t, and for the proto_prefix function to accept language_tag as an additional argument (which is trivial).

Theknightwho (talk) 20:39, 4 August 2025 (UTC)[reply]

All reasonable objections. But that's why the first thing I wrote was: We have |proto= which accepts yes or no; is that not sufficient? You did not answer that question. I asked that question because adding hidden special-case coding to override the normal operation of the template will also lead to confusion. If you don't want the automatic splat that prefixes all Proto <whatever>-language text, set |proto=no. Simple. No confusing special-case implementation. Autosplat is consistent for all Proto <whatever> languages (whether in the true sense or no). The reason that the splat is suppressed is obvious to readers of the article wikitext (this last suggests that, for readers of the rendered article, we might want to modify the title= attribute to somehow note that autosplat is suppressed for this 'text' – no idea what that modification might be).

—Trappist the monk (talk) 22:43, 4 August 2025 (UTC)[reply]

@Trappist the monk I don't think it will lead to confusion, because those actually dealing with Proto-Romance would not be expecting it to be treated as reconstructed by default. Think of it like this: it's generally a very safe bet that protolanguages (or, at least, languages with names starting "Proto-") will need to be treated as reconstructed by default, but each individual language still needs to be treated on a case-by-case basis, and those who actually work with those languages won't find it confusing at all, because they know how the language works. Of all the confusing exceptions that exist with languages, this is one of the milder ones. Likewise, there are languages that don't begin with "Proto-" that should be treated as unattested with a * by default (e.g. Golyad, the Pre-Greek substrate, all but three words of Hunnic, anything starting "Pre-Proto-" etc etc), but I don't want to tackle that right now. Theknightwho (talk) 12:27, 5 August 2025 (UTC)[reply]

Add Shaetlan

It is requested that an edit be made to the template-protected module at Module:Lang/data.
(edit · history · last · links · sandbox · edit sandbox · sandbox history · sandbox last edit · sandbox diff · transclusion count · protection log)

This template must be followed by a complete and specific description of the request, so that an editor unfamiliar with the subject matter could complete the requested edit immediately.

Edit requests to template-protected pages should only be used for edits that are either uncontroversial or supported by consensus. If the proposed edit might be controversial, discuss it on the protected page's talk page before using this template. Consider making changes first to the module's sandbox before submitting an edit request. To request that a page be protected or unprotected, make a protection request. When the request has been completed or denied, please add the |answered=yes parameter to deactivate the template.

Would it be possible to add Shaetlan as a new language please? It received the ISO 639-3 code scz last Wednesday. Thanks! — 🐗 Griceylipper (✉️) 19:43, 21 October 2025 (UTC)[reply]

Edit request 28 October 2025

This edit request has been answered. Set the |answered= parameter to no to reactivate your request.

Description of suggested change: Apologies, I do not know how the change would be made in the code so I do not know what the diff would look like, but I am requesting that the automatic italicisation of Halkomelem (hur) be turned off because it uses Americanist phonetic notation which contains Greek letters. Like Greek, Halkomelem should not be italicised per MOS:FOREIGNITALIC. Yue🌙 14:04, 28 October 2025 (UTC)[reply]

I guess I have to ask: are you sure? Do Halkomelem speakers actually write their language using Americanist phonetic notation? I ask because other languages aren't written using the International Phonetic Alphabet. Do Halkomelem speakers actually write this symbol: t̓ᶿ (U+0074: LATIN SMALL LETTER T + U+0313: COMBINING COMMA ABOVE + U+1DBF: MODIFIER LETTER SMALL THETA) when conducting the business of their community? Or is Americanist phonetic notation used to define proper pronunciation? Does written text use one of the three alphabets: Island, Cowichan, Stó꞉lō? See Halkomelem § Comparison.

Regardless, there is no automatic mechanism to prevent all text of a specific language tag from rendering in italics. There are quite a few other languages that use the Greek characters θ and χ in transliterations so Module:Lang italicizes them on a language-by-language basis. Fortunately, should you decide that Halkomelem must not be italicized, there are only about 70 articles that use the hur language tag so you could write a WP:AWB script to make sure that each of those {{lang}} or {{langx}} templates sets |italic=no.

But first, make sure that editors haven't created a mishmash that mixes the actual writing system with the pronunciation system in the Halkomelem article.

—Trappist the monk (talk) 16:00, 28 October 2025 (UTC)[reply]