Talk:Mapping of Unicode characters

I added the summary/categorized table of the UCS as I said I would on the UCS dicsussion page. I think if anyone feels the table should be narrowed, the decimal start and end could be omitted without much loss of readability. As I said in edit summariesa and over at the UCS discussion page, I'd like to also add links from within the table to sections of the mapping of unicode characters article and other articles too. I think each of the broad categories (lettered A through N) should be discussed in this article. Then links from each script-block could go to the article on the script or to an ariticle on the script in unicode/ucs.

So I plan to add the folowing sections to this article:

Scripts (Modern and Anicent)
Phonetics
Unified Diacritics
Unified Punctuation
Symbols
Numerals
Musical Notation
CJK and Unihan
Compatbillity characters (legacy and others) and normalization
Control characters, format characters and variation selectors
Surrogates
Private Use Code Points

Anyone else is welcomed to jump in on these tasks. --Indexheavy 01:16, 25 April 2007 (UTC)[reply]

On the charge of editorializing

I can understand how you might read it that way in isolation. I'm not trying to editorialize so much as help make the distinction between semantic characters and glyphs clearer. Many people cite it as a mantra, but don't necessarily understand it (now I'm editorializing). The point that this section (and what I plan to add to the linked main article) is to show that UCS lists the characters according to their glyph names. Meanwhile Unicode adds alias names that try to get a t the phoneme semantics. Right now its a hybird that helps serve as an excellent example of this distinction so often cited (numerals too, though less so). I hope that makes it clearer what I'm trying to do there. In the past many of these articles have simply been long lists of Unicode characters (at one point there was a single article deovted to every character). I didn't find that very encylopedic. I think here at wikipedia we serve readers better by expositing and providing examples and fleshing out these categories and expositing on some of the idiosynncratic characters (like the phoneme characters). --Indexheavy 02:55, 30 April 2007 (UTC)[reply]

BTW, perhaps I'm not understanding correctly what you thought was editorializing. Please respond here to clarify. Indexheavy 02:56, 30 April 2007 (UTC)[reply]

the problem with the Unicode consortium is that they seem to think their character names are self-explanatory. Except in some cases where they for some reaosn or other feel disposed to add a gloss. This is a problem (the recent addition of the cuneiform range really drove this home to the point of ridicule), and should be duly discussed, citing notable sources. But so far it is your choice to give such weight to the character name. A name is just that: a unique tag for a codepoint. The actual reason for encoding a character is buried in proposals somewhere. Thus, for a sourced discussion of why a character was encoded and not another, you have to dig up these proposals, study them, and quote from them. Just drawing your own conclusions from the names in the character charts is not helpful and violates WP:OR and specifically WP:SYN. dab (𒁳) 11:43, 2 May 2007 (UTC)[reply]

For some reason I missed your comment here until now. I wasn't ignoring you, I just didn't see it. I'm sure there are all sorts of interesting storeis, behind the scenes disputes and whatnot surrounding the Unicode and UCS. I'm not trying to write about that (nor do I have any expertise or sources on it). I'm trying to write from the Unicode Standard and the other publications of the Unicode consortium on their rendition of the "mapping of unicode characters". You're accusing me of violating WP:OR, yet I say again, I'm the only one who has added a reference to this article. I understand I could use some more specific references, but its quite disingenuine to accuse me of OR when not a single reference existed for this article until I began my edits. Secondly, on the charge of violating WP:SYN, I'm drawing only from the Unicode Standard (which is what I'm most familiar with) and not synthesizing from multiple sources as the policy outlines. I'm also not trying to advance a position. Perhaps if you told me what position you fear I'm advancing we could clear the air and I could try to avoid that misperception as I draft and redraft my material. Indexheavy 09:59, 9 May 2007 (UTC)[reply]

Indexheavy

Indexheavy, before you continue "overhauling" this article, may I ask you to cite your sources. Your "semantic phonemes" and "semantic characters" etc., while well-meant, simply add to the confusion (as I argued here). You want to "help make the distinction between semantic characters and glyphs clearer". I appreciate the thought, but at present you are not exactly helping. First of all, show that your usage of "semantic character" (as opposed to simple "character" is in any way endorsed by Unicode. Unless you do that, I'm afraid we'll have to deep revert to April 25. thanks. dab (𒁳) 11:37, 2 May 2007 (UTC)[reply]

Just to help you understand where my terminology comes from, here's a useful quote from The Unicode Standard 5.0 (p15): "The Unicode Standard draws a distinction between characters, which are the smallest components of written language that have semantic value, and glyphs, which represent the shapes that characters can have when they are rendered or displayed". For Uniocde (especially in contrast to ISO and the UCS without Unicode), many of the compatibility characters (like the Arabic initial, isolated, medial and final fomrs) are redundant. They are character encoding forms and not simply the character as "the smallest components of language that have semantic value" but rather characters that encode a specific abstract glyph for another character. Anything could be encoded as a character. For example, one could designate that code point U+E0FFA will be the letter 'g' from Linotypes Times Roman font version 2.3 released in 1992 (the dingbat characters are a similar example acknoledged by the Unicode Standard). However, these are not examples of semantic characters, but rather characters that encode glyphs. Unicode’s approach in contrast involves moving the handling of these forms/varaints to smart font technology and smart text rendering. These are distinctions made in the Unicode Standard: distinctions I'm trying to explain to a general reader in an encyclopedic manner. I feel a bit like I'm taking shots in the dark here. I'm having a hard time understnading how you read the Unicode Standard. But I'm trying to find ways to begin the conversation. Please let me know how you might reprhase some of my prose. In doing that we might start to understand the different readings. Indexheavy 11:17, 9 May 2007 (UTC)[reply]

Longest page on the English Wikipedia

According to this: special:longpages, this page is the longest page on the English Wikipedia, at 688,000 bytes. either this number is bogus, or this page will take a very long time to load on a low-speed link.

It appears that the HTML table was generated by a word processor. Please consider using a Wiki table or at least a better HTML editor. Thanks. -Arch dude 23:26, 3 May 2007 (UTC)[reply]

Please lend a hand in imporving the table. The conversion to a wikitable might help, but its largely a false efficiency. The wikitable still needs to be converted to an HTML table when its delivered, so everything gained in the compact wikitable syntax is lost upon delivery (keep in mind the size of the article in that list is the storage size, not necessarily the delivery size; when the table's delivered the "_" and "|" characters are replaced with complete "<tr></tr>" and "<td></td>" syntax). The wikitable gains other efficiencies by simply disallowing much of the HTML table semantics. The table was largely generated by hand (not by a word-processor). Unfortunately, many of the stylees had to be added in-line because Wikipedia doesn't support embedded or linked stylesheets for table styling (which would make the total size considerably smaller). If anyone wants to reduce the size of the table, the styles could probably be handled in some other way (I'm not familiar enough with wiki styling conventsions). Also it could be reduced by removing the tooltips, but I think they're quite helpful. Finally, it might make sense to move many of the table details off to separate articles once they're created. Then simply a summary table of the individual tables could appear on this page. So in summary:

converting to a Wikitable (not much gained)
Changing the styling (borders, cell horizontal and vertical alignments) to another syntax
removing or shortening tooltips (these are repeated for every cell with a lengthy phrase)
breaking table out into the separate related articles (I'll probably do this once I stableize the table and finish the detailed articles).

My goal here was to cr4eate a drill-down type group of articles, where one could start at this article and see how the various Unicode Planes and Blocks were grouped together and then follow through to see more detail on each block/script/character general category. So moving the tables to other articles would be consistent with that drill-down approach. Indexheavy 02:26, 4 May 2007 (UTC)[reply]

I shortened the titles (tooltips) considerably. I also removed most of the inline styles on the table cells. It still doesn't quite look the way i want it to, but its readable and it looks decent (oh if only the wikimedia software developers would enter the 21st century)..The classs and title attributes could probably be elminiated completely if we need to make it smaller. However, the steps I already took get us out of the top 15 articles so maybe we're off the radar now. I do think that breaking the detailed tables off into separate articles makes a lot of sense, so this article could be reduced substantially that way (in time anyway). 04:10, 4 May 2007 (UTC)

Thanks for considering all of the options. You are clearly on top of the situation. If you intend to subdivide the article eventually, may I recommend that you avoid all of the intermediate steps? There are no rules, and I'm just another editor with an opinion, but as you point out, many of the gains are either trivial or bogus. The big win occurs when you split the article. I therefore propose that we live with it as it is until you are prepared to split it. I am not competent to help much. Best of luck on this, and keep up the good work! -Arch dude 13:23, 4 May 2007 (UTC)[reply]