Talk:Unicode/Archive 7

This is an archive of past discussions about Unicode. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 5

Archive 6

Archive 7

Number of issues.

I just now edited the Issues section by including the number of identified "issues" with characters (codepoints) (there are, by my count of them in the April 2017 document cited, 94 of them.) I will include them as a copy&paste (with minor editing for brevity) from that article here, it may be helpful.

U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE not usually considered a single letter.
U+01A2 LATIN CAPITAL LETTER OI LATIN CAPITAL LETTER GHA, not OI
U+01A3 LATIN SMALL LETTER OI LATIN SMALL LETTER GHA, not oi
U+01BE LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE ligation of "ts"; not an inverted glottal stop
U+0238 LATIN SMALL LETTER DB DIGRAPH ligature, not a digraph
U+0239 LATIN SMALL LETTER QP DIGRAPH ligature, not a digraph
U+025B LATIN SMALL LETTER OPEN E Latin small letter epsilon [ idk if it is "open" or "closed" see U+025E]
U+025E LATIN SMALL LETTER CLOSED REVERSED OPEN E Latin small letter closed reversed epsilon (reversed form of U+025B).
U+0285 LATIN SMALL LETTER SQUAT REVERSED ESH reversed fishhook r with retroflex hook.
U+02C7 CARON hacek
U+030C COMBINING CARON combining hacek
U+034F COMBINING GRAPHEME JOINER incorrect discription of function; it does not join graphemes
U+039B GREEK CAPITAL LETTER LAMDA preferably, but not necessarily, GREEK CAPITAL LETTER LAMBDA
U+03BB GREEK SMALL LETTER LAMDA preferably, but not necessarily, GREEK SMALL LETTER LAMBDA
U+04A5 CYRILLIC SMALL LIGATURE EN GHE not a decomposable ligature
U+04B5 CYRILLIC SMALL LIGATURE TE TSE not a decomposable ligature
U+04D5 CYRILLIC SMALL LIGATURE A IE not a decomposable ligature
U+0598 HEBREW ACCENT ZARQA Misleading, probably should have been called Hebrew accent tsinnorit
U+05AE HEBREW ACCENT ZINOR Should have been called Hebrew accent zarqa (= tsinor)
U+0670 ARABIC LETTER SUPERSCRIPT ALEF Not an Arabic letter, but a vowel sign.
U+06C0 ARABIC LETTER HEH WITH YEH ABOVE not a letter but a ligature
U+06C2 ARABIC LETTER HEH GOAL WITH HAMZA ABOVE not a letter but a ligature
U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE not a letter but a ligature
U+0709 SYRIAC SUBLINEAR COLON SKEWED RIGHT SYRIAC SUBLINEAR COLON SKEWED LEFT
U+0964 DEVANAGARI DANDA Despite the fact that these characters have "DEVANAGARI" in their names, these punctuation marks are intended for common use for the scripts of India.
U+0965 DEVANAGARI DOUBLE DANDA Despite the fact that these characters have "DEVANAGARI" in their names, these punctuation marks are intended for common use for the scripts of India.
U+0A01 GURMUKHI SIGN ADAK BINDI GURMUKHI SIGN ADDAK BINDI
U+0B83 TAMIL SIGN VISARGA This character is actually the aaytham, and is not used as a visarga in Tamil.
U+0CDE KANNADA LETTER FA There is no Kannada letter 'fa', this character represents the syllable 'llla'. A formal alias correcting this error has been defined.
U+0E9D LAO LETTER FO TAM The name for this character should have been fo sung, but that name is already used for U+0E9F. A formal alias LAO LETTER FO FON correcting this error has been defined.
U+0E9F LAO LETTER FO SUNG The name for this character should have been fo tam, but that name is already used for U+0E9D. A formal alias LAO LETTER FO FAY correcting this error has been defined.
U+0EA3 LAO LETTER LO LING The name for this character should have been lo loot, but that name is already used for U+0EA5. A formal alias LAO LETTER RO correcting this error has been defined.
U+0EA5 LAO LETTER LO LOOT The name for this character should have been lo ling, but that name is already used for U+0EA3. A formal alias LAO LETTER LO correcting this error has been defined.
U+0F0A TIBETAN MARK BKA- SHOG YIG MGO This character is used to indicate that a document is addressed to a superior (the "petition honorific"), but the Tibetan name actually indicates a superior addressing an inferior ("starting flourish for giving a command").
U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG The tsheg mark is not restricted to intersyllabic usage, and would have been better named Tibetan mark tsheg.
U+0F0C TIBETAN MARK DELIMITER TSHEG BSTAR This character is not a delimiter, but is a non-breaking version of the tsheg mark (U+0F0B) that is used exclusively between the letter NGA (U+0F44) and the shad mark (U+0F0D).
U+0FD0 TIBETAN MARK BSKA- SHOG GI MGO RGYAN The syllable "BSKA-" does not occur naturally in Tibetan, and is a mistake for "BKA-" (cf. U+0F0A). A formal alias correcting this error has been defined.
U+11EC HANGUL JONGSEONG IEUNG-KIYEOK U+11EC HANGUL JONGSEONG YESIEUNG-KIYEOK
U+11ED HANGUL JONGSEONG IEUNG-SSANGKIYEOK U+11ED HANGUL JONGSEONG YESIEUNG-SSANGKIYEOK
U+11EE HANGUL JONGSEONG SSANGIEUNG U+11EE HANGUL JONGSEONG SSANGYESIEUNG
U+11EF HANGUL JONGSEONG IEUNG-KHIEUKH U+11EF HANGUL JONGSEONG YESIEUNG-KHIEUKH
U+156F CANADIAN SYLLABICS TTH There is no 'tth' syllable. A better name would have been Canadian Syllabics asterisk.
U+178E KHMER LETTER NNO As this character belongs to the first register, its correct transliteration is nna, not NNO.
U+179E KHMER LETTER SSO As this character belongs to the first register, its correct transliteration is ssa, not SSO.
U+200B ZERO WIDTH SPACE This isn't a "space". It is an invisible character that can be used to provide line break opportunities.
U+2113 SCRIPT SMALL L Despite its character name, this symbol is derived from a special italicized version of the small letter "L".
U+2118 SCRIPT CAPITAL P Should have been called calligraphic small p or Weierstrass elliptic function symbol, which is what it is used for. It is not a capital "P" at all. A formal alias correcting this to WEIERSTRASS ELLIPTIC FUNCTION has been defined.
U+234A APL FUNCTIONAL SYMBOL DOWN TACK UNDERBAR named according to the Bosworth convention. Inconsistent with current APL specifications & the London convention; the names of these five symbols no longer match APL usage for up and down.
U+234E APL FUNCTIONAL SYMBOL DOWN TACK JOT named according to the Bosworth convention. Inconsistent with current APL specifications & the London convention; the names of these five symbols no longer match APL usage for up and down.
U+2351 APL FUNCTIONAL SYMBOL UP TACK OVERBAR named according to the Bosworth convention. Inconsistent with current APL specifications & the London convention; the names of these five symbols no longer match APL usage for up and down.
U+2355 APL FUNCTIONAL SYMBOL UP TACK JOT named according to the Bosworth convention. Inconsistent with current APL specifications & the London convention; the names of these five symbols no longer match APL usage for up and down.
U+2361 APL FUNCTIONAL SYMBOL UP TACK DIAERESIS named according to the Bosworth convention. Inconsistent with current APL specifications & the London convention; the names of these five symbols no longer match APL usage for up and down.
U+2448 OCR DASH MICR ON US SYMBOL
U+2449 OCR CUSTOMER ACCOUNT NUMBER MICR DASH SYMBOL
U+2629 CROSS OF JERUSALEM cross potent. The actual cross of Jerusalem is a cross potent with a small crosslet added at each corner.
U+262B FARSI SYMBOL This symbol is so named because as symbol of Iran it cannot be encoded in ISO standards.
U+2B7A LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE
U+2B7C RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE
U+3021 HANGZHOU NUMERAL ONE HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3022 HANGZHOU NUMERAL TWO HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3023 HANGZHOU NUMERAL THREE HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3024 HANGZHOU NUMERAL FOUR HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3025 HANGZHOU NUMERAL FIVE HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3026 HANGZHOU NUMERAL SIX HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3027 HANGZHOU NUMERAL SEVEN HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3028 HANGZHOU NUMERAL EIGHT HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+3029 HANGZHOU NUMERAL NINE HANGZHOU is a misnomer. The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms
U+327C CIRCLED KOREAN CHARACTER CHAMKO An instance of inconsistent transliterations, resulting from irreconciled North/South Korean positions.
U+327D CIRCLED KOREAN CHARACTER JUEUI An instance of inconsistent transliterations, resulting from irreconciled North/South Korean positions.
U+A015 YI SYLLABLE WU a syllable iteration mark, not a syllable "wu"
U+FA0E CJK COMPATIBILITY IDEOGRAPH-FA0E unified CJK ideograph, not compatibility ideograph
U+FA0F CJK COMPATIBILITY IDEOGRAPH-FA0F unified CJK ideograph, not compatibility ideograph
U+FA11 CJK COMPATIBILITY IDEOGRAPH-FA11 unified CJK ideograph, not compatibility ideograph
U+FA13 CJK COMPATIBILITY IDEOGRAPH-FA13 unified CJK ideograph, not compatibility ideograph
U+FA14 CJK COMPATIBILITY IDEOGRAPH-FA14 unified CJK ideograph, not compatibility ideograph
U+FA1F CJK COMPATIBILITY IDEOGRAPH-FA1F unified CJK ideograph, not compatibility ideograph
U+FA21 CJK COMPATIBILITY IDEOGRAPH-FA21 unified CJK ideograph, not compatibility ideograph
U+FA23 CJK COMPATIBILITY IDEOGRAPH-FA23 unified CJK ideograph, not compatibility ideograph
U+FA24 CJK COMPATIBILITY IDEOGRAPH-FA24 unified CJK ideograph, not compatibility ideograph
U+FA27 CJK COMPATIBILITY IDEOGRAPH-FA27 unified CJK ideograph, not compatibility ideograph
U+FA28 CJK COMPATIBILITY IDEOGRAPH-FA28 unified CJK ideograph, not compatibility ideograph
U+FA29 CJK COMPATIBILITY IDEOGRAPH-FA29 unified CJK ideograph, not compatibility ideograph
U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET A spelling error: "brakcet" should be "bracket". A formal alias correcting this error has been defined.
U+FEFF ZERO WIDTH NO-BREAK SPACE Byte Order Mark (Naming it ZWNBSP was a mistake from the start.)
U+122D4 CUNEIFORM SIGN SHIR TENU CUNEIFORM SIGN NU11 TENU
U+122D5 CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR
U+1B001 HIRAGANA LETTER ARCHAIC YE The preferred name is HENTAIGANA LETTER E-1
U+1D0C5 BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS U+1D0C5 BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS
U+1D300 MONOGRAM FOR EARTH U+1D300 MONOGRAM FOR HUMAN
U+1D301 DIGRAM FOR HEAVENLY EARTH U+1D301 DIGRAM FOR HEAVENLY HUMAN
U+1D302 DIGRAM FOR HUMAN EARTH U+1D302 DIGRAM FOR EARTHLY HUMAN
U+1D303 DIGRAM FOR EARTHLY HEAVEN U+1D303 DIGRAM FOR HUMANLY HEAVEN
U+1D304 DIGRAM FOR EARTHLY HUMAN U+1D304 DIGRAM FOR HUMANLY EARTH
U+1D305 DIGRAM FOR EARTH U+1D305 DIGRAM FOR HUMANLY HUMAN

--sorry my Copy& paste does not retain the two columns there were in. 75.90.36.201 (talk) 20:06, 9 April 2018 (UTC)

I formatted your previous edit, then counted the number of asterisks ("*"s) in the source text. 94 seems to be the correct number. However this is on the verge of Original research. And will you track changes made to Unicode Technical Note #27? Love —LiliCharlie (talk) 20:33, 9 April 2018 (UTC)

I understand that there are (at least) 12 code-points which represent non-existent "characters" (also known as "ghost characters"). 妛挧暃椦槞蟐袮閠駲墸壥彁 are according to https://www.dampfkraft.com/by-id/a824aa10/#A-Spectre-is-Haunting-Unicode meaningless and NOT part of any language. In addition (as of 3/12/2018) parts of the issues section have been removed which in my view amounts to vandalism. The most egregious removal is all mention of the (politically motivated) concessions Unicode Consortium made to various nations because they claimed (rather than the experts of the relevant languages) to be the authoritative source of the language. The current article white-washes this (to some extent) by implying that some of these disagreements are over "ancient" or "obsolete" language elements when in fact some of them are in current (but "unofficial") use. Also, I vote that the 94 (or 106 if the above dozen aren't included) issues should be listed in the article (as a collapsed table, sortable by code-point name or U-number.72.16.99.93 (talk) 22:18, 3 December 2018 (UTC)

None of this is relevant. A bunch of these are controversial; after much discussion, the Wikipedia article is at caron, not hacek. The complaint you have about the APL characters says "named according to the Bosworth convention", which is a choice, not a mistake. Even the clear errors are irrelevant; we barely mention that Byzantine music and hentaigana are supported, thus stressing about the naming of one of the characters, a name that will have little effect on users, is beneath mention. Nobody will use 10% of Unicode's characters; it's not a real problem that there are 12 characters that have no real use.

Editing the issues section is not vandalism; it's people disagreeing with you. I'm not even sure what you're talking about; the last three months has had no changes to the issue section.

(Please don't use xx/xx/20xx date formats; they're inherently ambiguous, as a significant number of readers will interpret them as month/day and a significant number will interpret them as day/month.)--Prosfilaes (talk) 21:46, 24 December 2018 (UTC)