Jump to content

Talk:Standard Compression Scheme for Unicode

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Prosfilaes (talk | contribs) at 17:52, 30 July 2006 ("1 byte per character"). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

"1 byte per character"

When it says 1 byte per character (plus overhead) for many text files, that's exactly what it means. As long as they don't use obscure control characters, all text files in ASCII or ISO-8859-{1,5-9,11} use 1 byte per character plus a couple bytes at the start to set the mode. Not ~1 byte, 1 byte.--Prosfilaes 06:22, 24 March 2006 (UTC)[reply]

You're right; I shouldn't have added the tilde. —Steve Summit (talk) 15:45, 24 March 2006 (UTC)[reply]
ISO-8859-1 you are correct about, but other common encodings like windows-125x and the other parts of ISO-8859 do not map to contiguous unicode code points. Plugwash 23:16, 29 July 2006 (UTC)[reply]
ISO-8859-{1,5-9,11} map to contiguous Unicode code points. I think that's many text files.--Prosfilaes 17:52, 30 July 2006 (UTC)[reply]