Module talk:Lang-zh/Archive 4

This is an archive of past discussions about Module:Lang-zh. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Archive 4

Archive 5

Template:Zh-full

I've been working on merging {{zh-full}} into this one. The motivation is that where possible it makes sense to replace instances of {{zh-full}} with {{zh}}, as the recent work on this template has improved its output significantly. Where not possible, so where was used because of the features it provides over this one, then it should be possible to add the features to this. In particular the ability to list things in an arbitrary order is something that was pretty much impossible before but can be easily done in Lua.

As a first step I've been going through articles using {{{zh-full}} and replacing them with {{zh}} where possible. I've only been doing this as there are good editorial reasons: this template provides better output (proper language tagging, consistent italicisation), handles special cases better such as t and s being the same, handles empty fields properly, avoids redirects in its links, and is much shorter and easier to type. I've did other cleanup as I went, in particular of Chinese language.

Cantonese first issues

And it has been possible in almost all cases, to replace them. The vast majority just had Chinese (simplified and traditional), pinyin and Wade-Giles in some combination. One had IPA which I changed back to pinyin as more common, useful and easily understood. Otherwise they were straightforward replacements, with a few requiring 'first=t' to put traditional Chinese first.

The only two left are Hong Kong topics, Bat Seui Yiu... Yun Mei Dak Ho Pa and A.S. Watson Group, which have Jyutping and pinyin Romanisation and put the Jyutping first. This is easy to fix though: have 'first=t' also put Cantonese Romanisations before pinyin and Wade Giles. The logic is that when first=t is used and Cantonese is supplied it's a Hong Kong topic. Not Taiwanese as Cantonese is not used there, not mainland as simplified Chinese should come first there. And in a Hong Kong target if both Cantonese and Mandarin Romanisations are given the Cantonese should go first. This is very simple fix: it only requires changing one line, this one

		orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "poj", "zhu", "l"}

to something like:

		orderlist = {"c", "s", "t",  "j", "cy", "p", "tp", "w", "poj", "zhu", "l"}

The other options are

add a separate option for 'Cantonese first'. Easy to do but another option that hardly anyone will use seems unnecessary
Add the code I envisaged at the start to output fields in the order they're given. I now see no need for this at all, given how all the instances of {{zh-full}} didn't use it's ability to reorder stuff; with the two exceptions noted above everything was in the same order, or with t and s swapped which {{zh}} already supports.

Any thoughts?

I can perceive Cantonese first being used for some mainland/crossborder articles such as Cantonese people. Also if the template is ever expanded to include Hakka or Tibetan, then there would be cases where those should go first and so the system of prioritisation needs to be flexible enough to allow future expansion.

Perhaps change the frist=t into a region specifier for TW, CH, HK, SG. So for example specifying region=HK would select the order {"c", "s", "t", "j", "cy", "p", "tp", "w", "poj", "zhu", "l"}

Another way would be to allow the editor to supply an ordered list to override the detault. This could be made backwards compatible with first=t. For example if the editor did first=t,s,j the template would output t frist, then s, then j, followed by any other supplied fields in the default order. If the editor did first=t,poj,tp then we would have the Taiwan related sections first and the others following. Most editors would not use this, leaving the default, but it would be there for those that wanted to. Also it wouldn't break any existing pages as those currently with first=t would still get t outputed first followed by the others in default order, the same as today.

Another feature of {{zh-full}} is that it allows you to rename the labels. I don't know if anyone has used that feature on any article? I also don't know if anyone would want to use that feature or if it could be deprecated without anyone noticing? Rincewind42 (talk) 15:29, 24 May 2014 (UTC)

I had thought of that before: doing it based on region rather than a single switch, first=t. The problem is it's quite disruptive – all existing instances of first=t would have to be found and updated, changed to region=tw or region=hk based on the article, and I don't know how you'd find them. Or you leave both first=t and region=xx in the template which introduces redundancy as with Cantonese=first, and seems overkill for something that there's no obvious need for - the only instances of {{zh-full}} with ordering unsupported by {{zh}} were with Jyutping first.

The editor specifying the order is another way of doing it. The way it would work is with an extra option, ordered=no. If the module detects this it doesn't use a fixed order but the order is the same as the parameters passed to the template. Essentially the same as how Template:Tlzh-full works but done in code not templates within templates as for that. It also introduces redundancy but can be thought of as two levels: a switch for simple cases, an option to use any order for more specialised cases.

I don't think replacing labels is a good idea. It would make the template much more complex and be little used (it wasn't used at all within {{zh-full}}). If editors need that degree of control over labels, links, formatting they need not use the template, or can use it for some languages but use {{lang}} with their own labels and formatting for those they want customised.--JohnBlackburne^words_deeds 16:22, 25 May 2014 (UTC)

I've changed the sandbox to put Cantonese first when first=t is specified; as noted above it's a very simple change. The results look like this

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu}}

gives

simplified Chinese: 中国; traditional Chinese: 中國; pinyin: zhōngguó; Jyutping: Gwong²zau¹; Cantonese Yale: Gwóngjàu

while

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=t}}

gives

traditional Chinese: 中國; simplified Chinese: 中国; pinyin: zhōngguó; Jyutping: Gwong²zau¹; Cantonese Yale: Gwóngjàu

--JohnBlackburne^words_deeds 21:31, 26 May 2014 (UTC)

|first=t might also be used for Taiwan topics, or for ancient topics. In both cases one would still want pinyin before the Cantonese romanizations. Kanguole 22:53, 26 May 2014 (UTC)

Taiwan topics should not have or need Cantonese Romanisations; looking at Languages of Taiwan and it's not even mentioned. As for ancient topics it's usual to put simplified first and just give pinyin. I wish there were some easy way of checking this but I strongly suspect first=t is only used in Hong Kong and Taiwan topics and in those only Hong Kong topics include Cantonese [Romanisations].--JohnBlackburne^words_deeds 23:55, 26 May 2014 (UTC)

I've found a way of checking uses of first=t. I went to Special:Export, exported all the pages in Category:Articles containing traditional Chinese-language text, which gave me a 69MB XML file. I searched through that for all instances of {{zh}} with first=t in it, writing them out to a file, which I've copied to here: User:JohnBlackburne/zhdump. It's possible I missed some: the way I searched would have missed out templates split over multiple lines (I saw one) and also templates containing other templates such as {{linktext}}, but it should have found the majority. I can see six Jyutping and three Cantonese Yale, out of a little over 200, so it's almost all Taiwanese.--JohnBlackburne^words_deeds 23:57, 27 May 2014 (UTC)

Looking at the uses at User:JohnBlackburne/zhdump, which I've added article links to, there's at least one problem article, Jao Tsung-I who's has first=t, pinyin and Cantonese Romanisations. His bio is varied and interesting, with stints in Hong Kong and Singapore so there's no obvious right way round; it may be based on his personal or professional preferences. It's not obvious the Cantonese Romanisations should go first in this case just because of first=t.

So I've thought of a way of handling traditional and Cantonese first separately, without introducing a new parameter or redoing everything with regions (which anyway not help much with Jao Tsung-I): overload the existing parameter so you can supply other values. I.e. first=t, first=j (for Jyutping) and first=tj or first=jt are all supported. Easiest just to demonstrate it:

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=t}}:

traditional Chinese: 中國; simplified Chinese: 中国; pinyin: zhōngguó; Jyutping: Gwong²zau¹; Cantonese Yale: Gwóngjàu

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=tj}}:

simplified Chinese: 中国; traditional Chinese: 中國; pinyin: zhōngguó; Jyutping: Gwong²zau¹; Cantonese Yale: Gwóngjàu

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=t|labels=no}}:

中國; 中国; zhōngguó; Gwong²zau¹; Gwóngjàu

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|cy=Gwóngjàu|first=jt|labels=no}}:

中国; 中國; zhōngguó; Gwong²zau¹; Gwóngjàu

So Hong Kong articles might have first=tj/first=jt, Taiwan ones probably first=t. I don't know if first=j would ever be used but it's there if anyone needs it. Obviously it still puts Chinese characters first and everything else after, but I don't see any need for this to change, especially after looking at all the uses of {{zh-full}}, all of which put Chinese first. This can easily be extended to handle more cases.

The third way I suggested, of ordering the fields based on the order of the parameters, is I think now not possible. It would easy enough to code but is incompatible with the visual editor which sorts parameters alphabetically, so would not allow you to put them in a particular order.--JohnBlackburne^words_deeds 15:48, 29 May 2014 (UTC)

other fields

The other thing I was looking at was other fields; what fields were being used that {{zh}} doesn't support? The answer is one, {{zh-IPA}}, in one article, which I replaced with pinyin. None of the others, so no {{zh-xiao}}, no {{zh-hkgov}}, no {{zh-viet}}. So there is no need to add any of them to this module, at least not based on their usage in {{zh-full}} This ties into #Other Chinese scripts above; I was hoping it would give some indication of which if any other languages of China were already being used in the larger template, but it seems not.

Of them all the only one being used at all outside of {{zh-full}} is {{zh-IPA}}; it's a useful template, providing links to three relevant articles. It's the only one I can see it making sense to add to this template, as an extra field, which would make it easier for editors to find if they want to add IPA. Or it could be tidied up and properly documented, and linked from here. --JohnBlackburne^words_deeds 19:44, 23 May 2014 (UTC)

There is a complication with how zh-IPA works in that it accepts a second field to switch between different types of IPA such as Mandarin and Cantonese IPA or others. I notice that {{Chinese}} has two fields mi and ci but no others. Are there any other IPA types that could be used? In {{Nihongo}} there is a blank extra field which if added to this module could work something like Chinese: 北京 however I don't see any advantage of this when you could have just done Chinese: 北京; Template:Zh-IPA; Template:Zh-IPA. It would probably be easier just to add mi and ci fields the same as {{Chinese}} has done. Rincewind42 (talk) 15:29, 24 May 2014 (UTC)

Having looked at it a bit more the IPA situation's a bit of a mess. There's a template for general IPA, {{IPA-all}}; {{IPA-wuu}} for Shanghainese/Wu is a redirect to it; there's a separate template for Cantonese/Yue, {{IPA-yue}}; there is none for Mandarin that I can see. There is though {{IPAc-cmn}} which converts pinyin to IPA ('cmn' is the IANA code for Chinese Mandarin; we use 'zh' for legacy reasons).

I'd be happiest leaving {{zh-IPA}} as a separate template; editors can use that or one of the other templates as appropriate. IPA's generally not needed for Chinese as pinyin is all you need for pronunciation and easier to learn than IPA; it's not like English where spelling and pronunciation are very irregular. As it's rarely used and there are different templates it's not obvious what should be added to this template. It was used only once within {{zh-full}} so there's no suggestion there that it needs to be added from that either.

What to do with {{zh-IPA}} is a separate issue but it should probably be either redirected to {{IPA-all}} like {{IPA-wuu}} if it's essentially identical, or moved to {{IPA-cmn}} like {{IPA-yue}}; properly named and added to the list of templates at Template:IPA-all/doc it might be used a bit more.--JohnBlackburne^words_deeds 15:55, 24 May 2014 (UTC)

I generally agree. I think the best next step from here, after the ordering issue above is finalised, is to look at related templates such as {{CJKV}} which could be changed to run of almost the same code that this template currently uses and thus give it extra features like no links, no labels and ordering. Rincewind42 (talk) 15:04, 25 May 2014 (UTC)

CJKV

I hadn't looked at it before, or at least not recently, but looking at it now {{CJKV}} works almost identically to this one; the latest italic changes here bring it closer. The only differences I can see are lack of language tags there for romanisations (which should be added there), lack of support for Japanese, Korean and Vietnamese here which would be easy to add (including 'if the Chinese and Japanese are the same then combine them - horrible to do in parser code, easy in Lua). So it would be straightforward to merge them.

And there are good reasons for doing so. There's not quite the same need as there is for {{zh-full}}, but apart from the reasons {{zh}} was converted to Lua looking at how its used in some cases it's used where {{zh}} would do, e.g. in Chery A15, while most uses are very similar. As with {{zh-full}} it makes no sense to have two templates being used for mostly the same thing if there's no technical reason to keep them separate.--JohnBlackburne^words_deeds 15:40, 25 May 2014 (UTC)

Although there is no technical reason, there is a cultural reason why we have {{zh}} and {{nihongo}} as separate templates. Korean doesn't seem to have it's own template. {{CJKV}} joins these together but it doesn't distinguish between Kanji and Kana or Hangul and Hanja. Also it doesn't include Japanese/Korea romanisations. If you combined identical Hanja/Hanzi characters, how would you label it? Also which language comes first? There needs to be a better way to order these. The the parameter bloat might become significant. Look at {{Chinese}} for example. All those options by how often are they used? In the end, though CJKV could be merged with zh, there will need to be two or three separate instances of near identical code. Partly to keep the parameters simple so that people can understand the template, and partly so the various interested groups don't conflict. Rincewind42 (talk) 05:20, 26 May 2014 (UTC)

Just for the record, I've come across numerous cases in the past where people have been upset or offended simply because of the template name. Like the former Yugoslavia and the rest of Eastern Europe, nationalism is kind of a thing in East Asia, and I've seen people getting upset over templates such as {{Japanese particle}} and {{Chinese}}, simply because the template has the word "Chinese" or "Japanese" in it, instead of "Chinese", "Korean", "Japanese", or whatever seems to be the topic of the article. Upset editors often blank or delete templates, or revert template additions, simply because they don't like the name of a template, and that's it; I remember having to make the {{Language particle}} redirect because one Korean editor got oh so offended by the word "Japanese". It's not as simple as things should be.

If we ever decide to use {{zh}} or anything else to replace CJKV templates after merging parameters, I think it would probably be a good idea to create template redirects as well; it's difficult to satisfy the needs of every single editor otherwise. Words such as "Chinese" and "Japanese" are a sensitive political issue in some areas, and edit wars often start over trivial matters such as these. I think the mindset is that if you put the {{Chinese}} template on a Korean topic article, it's claiming that it "belongs to China" or something (though logically speaking, it really shouldn't, it's just a template used to contain multilingual names). --benlisquare_T•C•E 05:45, 26 May 2014 (UTC)

As well as template redirects using Lua offers another possibility: two templates with the same Lua implementation. That's how most of the citation templates work; they call Module:Citation/CS1. You can then supply extra parameters that tell it to do slightly different things (or very different things) depending on which template it invoking it, though in this case they work so similarly already that it should be possible to treat them the same way.

I'm not too worried about this causing problems, such as cultural ones. It not that {{zh}} is for Chinese and {{CJKV}} is for other languages; it's mostly used for Chinese too. In fact some if not most of its uses are just Chinese, where the current {{zh}} is a drop in replacement, the only differences being things in {{CJKV}} now fixed in {{zh}}. Further it's used very few times, 329 or around 1% the transclusion count of {{zh}}, so should be far less disruptive than recent changes to this template.

{{Nihongo}} is quite separate; not only does it share no fields with this one (except I think 'literally') but it works very differently, with a number of minor and major differences in output. If I were converting that to Lua I'd do it separately, i.e. as its own module.--JohnBlackburne^words_deeds 13:18, 26 May 2014 (UTC)

I should add that having looked at it again there are at least two bugs in the output of {{CJKV}} that need fixing. See Template talk:CJKV#Problems. The problems are precisely the sorts of problems that arise in complex parser code (the extra semi-colon was happening here too, with s = t, before it was converted to Lua). That template's code is particularly tricky though, even worse than this one before it was converted to Lua, so by far the easiest fix would be to convert it to Lua.--JohnBlackburne^words_deeds 13:41, 26 May 2014 (UTC)

First=tj

Breaking this out into a new section as I think this is ready for rolling out, but I'll summarise it again as it's embedded a few subsections back.

I've added support (to the sandbox) for putting Cantonese Romanisations 'first', that is before Mandarin Romanisations. So Jyutping and/or Cantonese Yale (whichever is specified) go before pinyin, Tongyong pinyin and Wade–Giles (again whichever is/are specified). The thinking is similar to putting traditional before simplified Chinese chars, it won't be used as much but it's there for when editors need it. My survey of the uses of {{zh-full}} suggest it's the only ordering not supported by the current template that there's any need for.

The way it works is by overloading the first= parameter, so as well as specifying first=t an editor can supply first=j, or combine them in one as first=tj or first=jt. If 't' is supplied then traditional Chinese characters go first. If 'j' is supplied then Cantonese Romanisations go first. Anything else is ignored (which allows for future support for other reordering rules using other letters). Here it is in action and, examples have also been added to the testcases:

{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|first=t|labels=no}}: 中國; 中国; zhōngguó; Gwong²zau¹
{{Zh/sandbox|s=中国|t=中國|p=zhōngguó|j=Gwong²zau¹|first=j, t|labels=no}}: 中國; 中国; Gwong²zau¹; zhōngguó

This is better than other options considered.

Having Cantonese first when traditional is first works with most but not with all articles, and would change existing articles perhaps incorrectly.
Using regions (HK, TW, CN, SG, etc.) might work better but would be very disruptive and not handle all cases, such as Jao Tsung-I.
Adding another parameter would be unnecessary clutter, especially for something so little used.
Doing it optionally based on the order parameters are listed, like {{zh-full}}, is incompatible with the Visual Editor.

It doesn't change any existing instances of the template, so is very safe. It's also interesting as perhaps the first change that would be almost impossible to do with parser functions, i.e. without Lua (sure there would be a way but it would be horribly complex).

So I think this is ready to roll out to the main template/module.--JohnBlackburne^words_deeds 03:15, 30 May 2014 (UTC)

I would prefer that the 'tj' be delimited in some way such as a comma. e.g. 't,j'. This is because at some time in the future you may want to expand the list of accepted values for 'first' and some future attributes might not be single letters. This would future proof the template somewhat. Rincewind42 (talk) 14:43, 30 May 2014 (UTC)

Done. I might have quibbled over this if it were just for the reason given, as 26 letters of the alphabet should enough for as many options in future as we might need. But I think it also helps visually: it's easier to see that there are two things, not one, as "jt" could e.g. be an odd abbreviation for jyutping. The delimeter can be anything non-alphabetic; comma, comma-space, slash, space should all work. The code supports multiple-character specifiers though it only recognises 't' and 'j' at the moment. I've updated the testcases and the examples above.--JohnBlackburne^words_deeds 15:28, 30 May 2014 (UTC)

This edit request has been answered. Set the |answered= parameter to no to reactivate your request.

Again please update the main module from its sandbox, to effect the changes described immediately above and in more detail in #Cantonese first issues.--JohnBlackburne^words_deeds 23:23, 31 May 2014 (UTC)

Done Jackmcbarn (talk) 18:53, 1 June 2014 (UTC)