Jump to content

Talk:Plane (Unicode)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Elphion (talk | contribs) at 16:21, 5 October 2016 (0x00E00000 to 0x00FFFFFF/0x60000000 to 0x7FFFFFFF: not Unicode). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconComputing Unassessed
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.

Turoslangos is playing games here. Neither UTC nor WG2 will accept Old Hungarian into the BMP. There isn't room, and neither is there justification for encoding it there. -- Evertype· 21:04, 7 November 2008 (UTC)[reply]

Plane 16 and "20-bit limit"

Obviously, Plane 16 (100000-10FFFF) is a 21-bit entity (why they crashed thru to Plane 16 with 3-13 unused seems rather inelegant here, but I'm not a Unicode expert. I can, however, decipher hexadecimal. I have no idea how to "improve" ("correct"?) this, but it needs to be done. Grndrush (talk) 17:18, 3 January 2009 (UTC)[reply]

I was about to say much the same. Is the answer to call it a 17-plane limit and ignore the bit-question? Alternatively one could explain that the 20-bit limit is a matter of the address space defined by the available surrogate pairs, and thus defines the number of planes available beyond the BMP. (If I have understood aright…) Ian Spackman (talk) 00:11, 28 July 2009 (UTC)[reply]
21 bit is just an outcome, it is not the preset limit. Here we go. BMP is defined the full 16 bit (hhhh): 0000-FFFF, ~65000 numbers. (So prefix is 00hhhh so Plane=0). IN this plane are defined 1024 high surrogates and 1024 low surrogates, at D800-DBFF and DC00-DFFF. Surrogates must be used in pairs (one high, one low) to point to a character. So they can identify exactly 1024x1024 ~1M points. Together they need hhhhlow.hhhhhigh is 32 bit. So the 1M points are within the range D800.DC00 - DBFF.DFFF (but not every point in that range).
In comes UTF-16. UTF-16 recalculates these 32bit numbers 1:1 into the range 10000-10FFFFhex, starting right after plane 0 (at FFFF+1), and exactly filled with the ~1M points, creating planes 1 to 16dec (=the final 10hex). Now there is no unused number any more, and the whole range can be identified with 21 bits.
So because there are 1024x1024 surrogates defined, the UTF-16 recalculated numbers fit exactly in a 21-bit range. Starting plane 17 at 10FFFF+1=110000 would need a 22nd bit, and cannot be recalculated to the high-low 32bit pair.
Nowadays the U+hhhhhh notation is used commonly. -DePiep (talk) 17:13, 6 October 2010 (UTC)[reply]

Typo in "Supplementary Special-purpose Plane" ??

The section "Supplementary Special-purpose Plane" includes the line:

Variation Selectors Supplement (0E0100–E01EF)

That zero in front of the first hex number sure looks wrong to me, but I honestly don't know enough about this topic to know if it serves some actual purpose. Would someone better informed please fix it if it's wrong, or say why it's right?

Private Use Area planes for social networks

I've been finding HTML documents with glyphs for Facebook, Twitter, etc. as Unicode characters in the Private Area Use planes. This requires a custom font. Any references on this? --John Nagle (talk) 20:46, 30 April 2013 (UTC)[reply]

As the definition goes: anyone can publish or use a character definition in PUA space (example: I may have a PUA character to mail to my spouse to say X, and only we two know. We don't see the font, but the char number is enough for us to meet). If FB or TWI does so, it is up to them to provide the font, and to make it work publicly. If they can't get that right, the reader will see the wrong character. Like in the old day: question marks at best.
Actually, is that so? Examples by FB or TWI? It could be users/companies are useing PUAs (writing on FB or TWI), but then the issue is with these users. -DePiep (talk) 21:08, 30 April 2013 (UTC)[reply]

UTF-8 "designed for 2^21 bits"

The UTF-8 coding scheme was designed when Unicode was still contemplating a 31-bit space. It was not "designed" for a limit of 2^21 codepoints, and was eventually restricted to a much smaller number anyway (0x10FFFF). Elphion (talk) 01:13, 3 October 2016 (UTC)[reply]

Why would Unicode modernize a code space by making it smaller? 108.71.123.25 (talk) 16:05, 5 October 2016 (UTC)[reply]
Because otherwise the parties could not agree on a standard. Too many manufacturers were already heavily invested in 16-bit characters. UTF-16 was the compromise that allowed the standard to go forward. When eventually we run out of space (and we will, though computing technology will have changed a lot by the time that happens), larger spaces will be introduced. But they will not be "Unicode". -- Elphion (talk) 16:18, 5 October 2016 (UTC)[reply]
But 0x00E00000 to 0x00FFFFFF and 0x60000000 to 0x7FFFFFFF were assigned! And my flip phone uses such an operating system that uses a 32 bit code space. 108.71.123.25 (talk) 16:21, 5 October 2016 (UTC)[reply]

0x00E00000 to 0x00FFFFFF/0x60000000 to 0x7FFFFFFF

Some operating systems still have these as private use areas. 108.71.123.25 (talk) 16:07, 5 October 2016 (UTC)[reply]

But those are not Unicode planes, the subject of this article. The Unicode standard sets a maximum of 17 planes. There is nothing to stop people from storing other values in 32 bits, but that's not Unicode. -- Elphion (talk) 16:13, 5 October 2016 (UTC)[reply]
Universal Character Set still has this. Some operating systems still have these. My flip phone has one such operating system that uses UTF-32/UCS-4, and it shows an 8 digit code point. 108.71.123.25 (talk) 16:17, 5 October 2016 (UTC)[reply]
No, UCS was revised to agree with Unicode, for consistency. Whatever your flip phone uses is not Unicode, and not UCS-4, no matter how it might be labeled. -- Elphion (talk) 16:21, 5 October 2016 (UTC)[reply]