Talk:Phonetic symbols in Unicode

Lots of Non-Displayed Characters

Whats the point of creating this chart if you can't display the characters ? Just to see hex column or rows? Even Firefox cannot show the hex 02EA~1D2B, 1D66~1DBF (except very few) ! Until most browsers supports or capable of display all those characters, a graphical representation is necessary, through picture/graphic file(s). Thanks. ~ Tarikash 00:34, 14 July 2006 (UTC).[reply]

that's unrelated to which browser you are using. You have to have unicode fonts installed. I am using Firefox and most characters render properly for me. Of course you are free to add graphics, and to add a list with an entry for each character (name, purpose, etc.), along the lines of Miscellaneous Symbols/Letterlike Symbols. If you just want to see what the character should look like, you may also click on the external link (unicode.org), where you get pdf files of the ranges. But by all means, do add the graphical representation. I have no motivation to do that, because I see the characters already. dab (ᛏ) 16:09, 14 July 2006 (UTC)[reply]

Modification, installing font(s), or by changing default setup ... many things can be achieved, but, normally (default setup) cannot be seen. (i thought) we are trying our best to make pages/things working/displayable, with minimum or no extra modification, for the average users. Anyway, background color approach is very nice, easily distinguishable character categories, great. Editing individual graphics for each char takes up too much time, but i'll try that in future. But adding character entity name should be little easier. ~ Tarikash 21:15, 14 July 2006 (UTC).[reply]

To use one of the available Unicode fonts to display the Unicode special characters, we need to specify the class="Unicode" in the table's TR tag (or, in each TD tag, but using in each TR is easier than in using in each TD), for wiki table code, we need to specify that after the "|-" (like |- class="Unicode"). Template code {{Unicode|char}} , <span class="Unicode"> ... </span>, etc for each character can also be used. I've updated few articles already with this class, and looks like few more characters are showing up than before. Thanks. ~ Tarikash 22:09, 15 July 2006 (UTC).[reply]

well, I am glad you know about {{unicode}}: note that it is just a temporary thing, it may help certain browsers to display things, but in the long run (say, if you wait for another year or so), most browsers should be able to render all this out-of-the-box. As I say above, you are welcome to add the {{unicode}}, of course, as well as other information. dab (ᛏ) 22:42, 16 July 2006 (UTC)[reply]

Until most browsers support these by default without any modification, we should still specify and use these, it is only a matter of one bot and may be an hour to remove all these codes, when not needed/necessary any more, am i right? so why should we make things disfunctional at this moment, and wait for one year to work? by letting the chart/symbols displayable for one year, we will help many to see it, and understand it better, thus further progress of Unicode. ~ Tarikash 22:58, 16 July 2006 (UTC).[reply]

"Semantic Phonemes"

The criticism related to "semantic phonemes" added by User:Indexheavy appears to be based on a number of misunderstandings, in particular surrounding the differences between glyphs, characters and referents (signifiés, viz. phonemes etc.) of characters. The topic it appears to address is the canonical names of some "IPA Extensions" characters. Since the Unicode range itself is called "IPA extensions", it somehow stands to reason that the character called "LATIN LETTER BILABIAL CLICK" is really the IPA symbol for a bilabial click, since there is really no Latin letter for a bilabial click. Yes, the character names are often not very happy choices. This points to a lack of professionality or consistence sadly often observed in the Unicode standard, however, the character names are merely convention anyway, and it is difficult to follow why they should be analysed depending on whether they describe a glyph shape or not, precisely because they are just a rather clumsily chosen convention. I suggest it is enough to just list the names and be done. If there is notable criticism related to the naming of the IPA characters, we should by all means cite it, but as it was, this discussion of "Semantic Phonemes" was imply "original research". dab (𒁳) 11:11, 1 May 2007 (UTC)[reply]

I think you misunderstand the purpose of the section. It wasn't meant so much as a critique as an aid in understanding the so often cited difference between glyphs and semantic characters. The phonetic characters serve as an excellent example where many reader will be able to graasp the difference. Moreover, I disagree that the names of the characters are merely convention. The code point means nothing to authors. All that remains to provide guidance on the use of the character are the glyph (which could vary widely from font to font) and the character names (along with script and block that are usually reflected in the names). The example you give of the bilabial click is an example of a semantic character name (and I do not think its one of the unfortunate names). Anyway, I'm going to restore the section. Perhaps some discussion here will help us understand what needs to be changed to make the writing more clearly convey what I wanted to convey (which wasn't all a critique). Indexheavy 19:45, 1 May 2007 (UTC)[reply]

Looking again at the change you made, this is the type of issue I'm trying to addres. You worte:

Unicode includes letters and marks from the International Phonetic Alphabet (IPA) and those supporting other phonetic alphabets too. In some instances, the canonical Unicode character name is IPA-centric in the sense that it appears to treat IPA conventions as part of the Latin alphabet, e.g. the character "LATIN LETTER BILABIAL CLICK" (U+0298 ʘ) is in fact an IPA symbol that has nothing to do with the Latin alphabet.

The issue is not with UCS names like that for character U+ 0298. Rather its that so many other characters have UCS names that do not correspond to the semantics (like latin letter r with tail, U+027D) rather than a semantic phoneme name. Also to clarify these phoneme names are not unique to the IPA; they are used across all phonetic disciplines. I suppose the use of Latin in the name may be related to the issue I am trying to shed light on, since semantic phoneme characters would be their own writing system and not tied to any other alphabet. Anyway, its clearly a confused topic and this issues surrounding phonetic characters (I don't like the use of the term symbosl in the name of this article since it also represents the same conflation I'm trying to address). Again, this isn't meant to be a critique of Unicdoe or UCS so much except to the extent that they contribute to the confusion in the general public.

So the problem I'm highlighting is not the use of the term ‘Latin’ in the case of the bilabial click. Its that in terms of Unicode’s goals of semantics (not always shared throughout the Unicode, UCS and ISO communitites), all the various phonetic writing systems are all a single writing system. The different phonetic alphabets could be handled by changing fonts or through glyph variant selection from the same font.

However, the various phonetic writing systems all borrow glyphs from other writing systems (either unchanged or with minor modification). This all leads to confusion among those authoring with Unicode characters. It also helps serve as a great example of the different between glyphs and semantic characters. Again, this is not original research its simply an example of other commonly cited confusion betweenglyphs, characters and referents (as you yourself cited above). However, too often the confusion is merely cited without giving readers any concrete examples. Here with phonetic characters, there's an opportunity to drive home those distinctions with examples that are too muddled in other writing systems. For example, in the Latin alphabet the name "Capital Letter A" describes a semantic character. However, it also connotes a glyph, so a reader cannot seaparate those two constructs in that context. However, in the context of a phonetic writing system, the name "Small Capital Lettter R" describes only the glyph: not the semantic character. In a sense the name really implies “The same glyph as a Small Latin Capital Letter R”. This disjoint between character and glyph (that does not occur when discussing the same character within the Latin script), provides an opportunity for an example that drives home the glyph/character distinction. I hope this is clearer here. Perhaps this discussion can help clarify the topic in the article. Indexheavy 20:45, 1 May 2007 (UTC)[reply]

I appreciate the point you are trying to make, but I repeat that your discussion is confused. First things first: what are your sources? I am prepared to read you charitably, but do you have any specific source for your "often cited difference between glyphs and semantic characters"? I do understand your use of "semantic", but it is itself confused and idiosyncratic: whence do you take your term "semantic character"? Proper terminology is simply character (graphemes) vs. glyph. "Capital Letter A" denotes a character just as much as "r with tail" or "IPA symbol for bilabial click" or "Small Capital Lettter R": these do not denote glyphs. A glyph is a specific graphical realisation of a character. I am afraid that you are yourself still rather confused on these points, which obviously doesn't help you in making the points you want. Also, this whole discussion would belong on character (computing), it is not really pertinent to Unicode Phonetic Symbols in particular. The problem UCS is facing is always: how do we delimit a single grapheme? There is no simple solution, since graphemes tend to blend into one another. Thus, any encoding standard will have to draw arbitrary lines. If you like, it is arbitrary to give two codepoints to Greek Α and Cyrillic А: The Cyrillic alphabet is "really" just the Greek alphabet with a few extra letter for Slavic phonemes. However, they have evolved apart far enough that there could be no question of treating them as separate scripts (case in point, Η vs. Н). But, the Coptic alphabet is an example where UCS at first opted to consider it an extension of the Greek alphabet, and later (4.1) changed its mind about it (which results in another hair-raising instance of a script's codepoints spread all over the BMP). These are the examples you are looking for. Discussing this with IPA is confusing. Strictly speaking, IPA a is not the same character as Latin a, but it would have been absurd to allocate a separate codepoint (not more absurd than allocating codepoints for Latin numerals, though; I wouldn't put it beyond the Unicode consortium to come up with "IPA LETTER OPEN FRONT UNROUNDED VOWEL" which will simply map to a glyphs). These are -- often difficult -- judgement calls, and the consortium gets it right sometimes, and not quite right at other times. We could discuss this at character (computing), character encoding, Universal Characer Set, Unicode Consortium or similar, but I really see no reason to detail it here. dab (𒁳) 09:05, 2 May 2007 (UTC)[reply]