Talk:Unicode input

Windows EnableNumKeypad clarification

Can someone please add a note about how, when using the Windows hexadecimal entry method involving EnableNumKeypad and Alt + <+>, one enters the hexadecimal digits A through F, which are not on the numeric keypad? —Largo Plazo (talk) 13:09, 9 December 2009 (UTC)[reply]

I assume you mean the EnableHexNumpad statement? I'm sorry I can not answer your question, but since the original reference source is down, I can not reproduce this to effect on my version of Windows (Windows 7) to verify its accuracy I'm putting a dubious stamp on this particular section. --oKtosiTe ^talk 17:21, 4 December 2010 (UTC)[reply]

I've used this in Windows Vista (32- and 64-Bit versions) for a long time, so I can tell you that it does work. I just got and tried it in Windows 7 to no effect, but now that I've tried it again after several reboots and shut downs, it does seem to work. I'm guessing that a simple reboot is all that's required to make the registry change take effect.

Hexadecimal codes involving letters are entered using the standard letter keys. It's very inconvenient, but the functionality is there.

It works on Windows 7, but you do have to reboot after setting the registry key. I've updated the article and removed the "dubious" flag. —Preceding unsigned comment added by 213.246.131.69 (talk) 10:46, 6 January 2011 (UTC)[reply]

So, my keyboard--Windows setup has a state: Numpad does either decimal or hexadecimal (icw A-F keys) interpretation. Note that when I type "ALT + 92", this could be 92-hex. (By the way; there must be extra NumPad keyboard, with USB connection, that has & does all 16 hexes?) -DePiep (talk) 21:02, 6 January 2012 (UTC)[reply]

5-Digit codes

FYI... on the Mac, it appears you are limited to only characters in the Basic Multilingual Plane. I've not been able to find any information about inputting 5-digit codes for the supplementary planes. The Unicode Hex Input method works only with 4-digit codes.

I've added an explanation on how to do this on Mac OS. However I cannot find an authoritative source. Donlibes (talk) 03:46, 5 January 2012 (UTC)[reply]

In linux the same. In Windows???--Wickey-nl (talk) 15:18, 10 April 2011 (UTC)[reply]

On Windows (at least on Windows 7), Alt-x works on 4, 5 and 6 digit codepoints (i.e. any Unicode character). BabelStone (talk) 22:59, 10 April 2011 (UTC)[reply]

Did you install extra fonts? After doing so, I could use 5-digit codes on linux. Firefox seems to recognize the system fonts.
The quivira-font has quite a lot of characters.--Wickey-nl (talk) 20:02, 15 April 2011 (UTC)[reply]

Maybe it works on Windows 7. On Vista, however, you can definitely not enter 5- or 6-digit codepoints in this way.

DIBA--193.138.91.175 (talk) 12:01, 15 December 2011 (UTC)[reply]

Are you certain? I've just tested with WordPad on Windows XP, and using alt-x I was able to convert 1000, 10000, 20000 and 10FFFF to the corresponding Unicode characters (of course, without the appropriate fonts, they may appear as square boxes, but I verified that the codes really had been converted correctly). BabelStone (talk) 12:17, 15 December 2011 (UTC)[reply]

Unicode.org

I notice that http://www.unicode.org/ (specifically http://www.unicode.org/Public/6.0.0/charts/CodeCharts.pdf -- warning: 75Mb file) is not referenced in either the Unicode or Alt-code pages and instead private sites are referenced. Does anyone know why this decision was made? — Preceding unsigned comment added by 24.77.26.31 (talk) 19:50, 2 December 2011 (UTC)[reply]

Because people like to promote their own site or their favourite site? The Unicode page does link to the official code charts (which is better to link to than http://www.unicode.org/Public/6.0.0/charts/CodeCharts.pdf as it will always reflect the latest version of Unicode, whereas the 75MB pdf will be out of date next spring). Personally I would remove links to all the private sites, and only link to the official Unicode code charts, as the private sites tend not to keep up to date with new versions of Unicode, but I got reverted when I tried to prune the external links on the Unicode page. BabelStone (talk) 12:26, 15 December 2011 (UTC)[reply]

I agree on using the Unicode links Babel's way, but I disagree on deleting other links. E.g. [1] has extra options, such as text search (single word in character Names), and Full list (of say general category: Symbol, Other). Being out of date in the future is a minor in the tradeoff (esp when going from 6.0.0 to 6.0.1 ;-) ). -DePiep (talk) 18:16, 15 December 2011 (UTC)[reply]

Request for clarification concerning hexadecimal code input in Microsoft Windows using the Alt key

The left Alt key works for entering Unicode characters, can't say anything about a right Alt key, as my keyboard doesn't have one. The AltGr key doesn't work for entering Unicode characters. I hope this clarification can be considered sufficient, and therefore remove the request for clarification from the article.K1812 (talk) 19:58, 18 August 2014 (UTC)[reply]

Concerning Unicode input in Microsoft Windows and request for citation

Concerning the request for citation: the Windows 8.1 registry initially doesn't have the value EnableHexNumpad, so if you want to enter Unicode characters the way that's described in this article, you need to edit the registry and add the string type value EnableHexNumpad, and assign the value data 1 to it. While editing, i erroneously removed your request for citation. If you don't consider the above explanation to be sufficient, please add the request for citation again.K1812 (talk) 20:24, 18 August 2014 (UTC)[reply]

Request for citation concerning Microsoft Windows versions

As i rewrote the paragraph, i accidentally deleted the request for citation. I have used the described method on Vista and Windows 8.1. Others have used it on Windows 7. I couldn't get it to work on Windows 95. I suppose the reason might have been, that Win 95 initially might not have supported Unicode at all. There is some sort of Unicode add-on for Windows 95, but at the time, i couldn't even download it from Microsoft. Please add your request for citation again if you want more sources.K1812 (talk) 20:51, 18 August 2014 (UTC)[reply]

Edit by Loginnigol of 23 September 2016

@ Loginnigol: excuse me, but you removed important information from the article and made the instructions wooly instead. Instead of leaving the instruction, that the user should add a value to a registry key in the article, you instruct the user to add a line to the registry. A line in the registry can mean another key or another value. It's important to distinguish between keys and values when editing the registry. Failing to do so can and will produce a mess. I have now restored the instruction, that a value - and not a key - should be added by the user. --K1812 (talk) 06:44, 24 September 2016 (UTC)[reply]

What do we do about RFC 1345?

I have moved the "Character Mnemonics" section here from the "Unicode input" article. Although the section (here demoted to a subsection) has passing reference to Unicode 1.0, lumped together with "many other character sets", it doesn't bear much relation to Unicode specifically but rather to RFC 1345. The RFC 1345 Character mnemonic for the Greek letter λ, for example, is L*, which corresponds to nothing in Unicode. (The code point is U+039B aand the HTML character entity name is "lambda".)

The section does seem to be good encyclopediac stuff, but I don't have the background to create a new article around it or to know of existing articles that can incorporate it.

I have deleted the last sentence of the preceding section, "Unicode input#In platform-independent applications", which read:

The capability of Vim to create custom mnemonics, as described below, which could be employed on an ad-hoc basis, requires the decimal code point.

Please: someone with the relevant knowledge incorporate the material in Mainspace appropriately. Peter Brown (talk) 22:08, 29 November 2018 (UTC)[reply]

=== Character mnemonics ===
RFC 1345 defines a large number (1,893) of suggested mnemonics for code points in Unicode 1.0 (as well as characters in ISO 2DIS 10646 and many other character sets in use at the time of publication). Although the document does not restrict the length of a mnemonic (for example, "10000R" for U+2182), most (1,338) of the mnemonics are two characters long, and most (416) of the remaining are three-characters. While never complete, and targeting obsolescent set definitions, the mnemonics themselves can still be used.

Vim allows mnemonics entry (confusingly called "digraphs" by Vim developers) in insert mode (the regular mode for typing text) with Ctrl+K followed by a two-keystroke RFC 1345 mnemonic; or, in addition, if the digraph option is set, by entering the first character followed by a backspace followed by the second character. Custom mnemonics can also be defined for arbitrary code points. (For example, "dig Gr 9881" associates "Gr" with U+2699 ⚙ GEAR.)

GNU Emacs allows mnemonics entry by switching to rfc1345 input mode (by default Ctrl+u Ctrl+\).

GNU Screen allows mnemonics entry with (by default) Ctrl+A Ctrl+V.

Zsh allows mnemonics entry using the insert-composed-char widget.
RFC 1345 predates the introduction of the Euro sign (€, U+20AC), but the above applications included it as the mnemonic "Eu".

→Section moved by Peter Brown (talk) 22:08, 29 November 2018 (UTC)←[reply]

I have added an abbreviated version of the Vim discussion (first bullet above) to the Unicode input#Decimal input subsection. Peter Brown (talk) 19:44, 30 November 2018 (UTC)[reply]

Selection from a screen and WP:BURDEN

Here, I have reverted another editor's deletion of the section "Selection from a screen". According to the policy WP:BURDEN, however,

The burden to demonstrate verifiability lies with the editor who adds or restores material, and is satisfied by providing an inline citation to a reliable source that directly supports the contribution. (Emphasis added)

Though the section admittedly lacks the required citations, this is a burden I am unwilling to assume. I am strictly a Windows user, unfamiliar with macOS, Linux and BabelMap. Further, I never use selection from a screen in my own work. I have written an AutoHotkey script to handle em dashes and a few other characters; for anything else, I happily use Hexadecimal input techniques. I am not about to undertake a major research project into approaches that I have no intention of ever using.

So, should I self-revert, leaving "Unicode input" without the section "Selection from a screen", a section that has been part of the article since its creation in 2008? That's not acceptable either. Such selection is a technique for Unicode input, popular enough that several developers have created applets to support it. The lead paragraph lists it as a alternative. Without this section, the article would be seriously deficient.

Ideas? Will any of you, who do use the selection techniques or at least are curious about them, undertake to provide suitable citations? Or must I self-revert? In the latter case, I should probably propose that the entire article be deleted since, without the section "Selection from a screen", it fails to accomplish its purpose. Is there another approach?

Peter Brown (talk) 17:12, 3 December 2018 (UTC)[reply]

I restored the info with proper sources. TimTempleton ^(talk) ^(cont) 19:21, 5 December 2018 (UTC)[reply]

The .notdef box

We have used U+10FFFF in the hope that it is not used anywhere and thus will force display of a tofu block. But that codepoint is "private use area" and someone somewhere will use it eventually. Can anyone think of a better solution? Or just cross that bridge when we come to it? --John Maynard Friedman (talk) 09:20, 18 June 2020 (UTC)[reply]

I’d suggest using a non-character, e.g. the first one U+FDD0 “﷐”.

Further we’d better stop mixing up glyphs and food items except for real emoji. BTW why not call it (a slice of) pie? At least that has a dough crust around it. Tofu is actually filled, not empty, and while a .notdef box is white on white paper, there is still the black border left to account for. — Hnvnc (talk) 11:24, 18 June 2020 (UTC)[reply]

I think you've got a bento box in mind (though that starts full and ends empty and may have contained tofu :-) Thank you for changing the section title, I can't believe I wrote that, having challenged it as jargon only yesterday.

Yes. I support your solution. --John Maynard Friedman (talk) 12:53, 18 June 2020 (UTC)[reply]

U+10FFFF is in a PUA block, but it is in fact a non-character (like all characters ending FFFE or FFFF), so it should not occur in any conformant font. In fact it is less likely to be (mis)used than FDD0, so I think leaving it as U+10FFFF is best. BabelStone (talk) 13:57, 18 June 2020 (UTC)[reply]

It is a bit more complicated

Looking at Quotation mark#Unicode code point table on my Android phone using Chrome, for U+2E42 Double low reversed-9 etc, a simple empty box is displayed, but at U+1F676 San-serif heavy etc I see a box crossed with diagonal line. So we haven't quite solved the issue, because it seems that there are actually two issues. ~~I suspect that we may need Hnvnc's solution and BabelStone solution?~~--John Maynard Friedman (talk) 12:00, 20 June 2020 (UTC)[reply]

Curiouser and curiouser: Hvnc's box is displayed on Android with two diagonal lines, not an empty box. --John Maynard Friedman (talk) 12:13, 20 June 2020 (UTC)[reply]

Would it be acceptable to use U+25AF ▯ WHITE VERTICAL RECTANGLE as a simulacrum? --John Maynard Friedman (talk) 12:27, 20 June 2020 (UTC)[reply]

No I tried that, it looks too different from the error indicator.Spitzak (talk) 18:18, 20 June 2020 (UTC)[reply]

Yes, I know, too tall and too narrow. But we don't have to reproduce it exactly, we can say "similar to ▯". It is enough that we convey the idea, IMO. --John Maynard Friedman (talk) 19:48, 20 June 2020 (UTC)[reply]

U+2E42: ⹂ U+1F676: 🙶 U+10FFFF: 􏿿 U+25af: ▯ U+2c00: Ⰰ U+FFFF:  U+10FFFD: 􏿽 Spitzak (talk) 20:58, 20 June 2020 (UTC)[reply]

On mobile, I see valid characters for 2E42, 25AF, 2C00. All others render as box with diagonals except U+ffff which remained as . --John Maynard Friedman (talk) 22:41, 20 June 2020 (UTC)[reply]

As of why two different .notdef glyphs^[1] may show up in the same application, I think it depends on what font the renderer got stuck with when giving up. — Hnvnc (talk) 11:54, 21 June 2020 (UTC)[reply]

FWIW, I have the same version of Chrome on both platforms (Android and Chrome OS). --John Maynard Friedman (talk) 13:38, 21 June 2020 (UTC)[reply]

Firefox

Using Firefox 77.0 on Win 10 and Sputzak's test line, I see valid characters for 2E42, 25AF, 2C00. All others render as box with the hex squeezed in (two rows of three hex digits) except U+ffff which remained as . And the glyph displayed for U+25AF is short and fat, almost identical to the empty box shown by Chrome. --John Maynard Friedman (talk) 13:38, 21 June 2020 (UTC)[reply]

References

^ "Pet peeve: empty .notdef character". TypeDrawers. 2018-05-07. Retrieved 2020-06-21.

Decimal input (Windows)

This section is misleading. It implies that Alt+0nnn produces the Unicode codepoint at nnn₁₀. This is not true. The leading 0 only instructs the OS to chose the glyph from the currently-loaded Windows code page. (If the 0 is omitted, it uses a the OEM code page. By coincidence, for users with US or UK keyboard mapping, there may be sufficient overlap with low-value Unicode for their purposes but it is certainly not a generic Unicode input method. I suspect it encourages the misapprehension that the word "Unicode" means "Latin characters not available as standard on my keyboard".

I propose to delete this material unless someone can come up with a convincing reason to keep it. --John Maynard Friedman (talk) 09:08, 12 September 2020 (UTC)[reply]

Oppose:

Using Random.org, I picked eight 4-digit decimal numbers at random and converted them to hexadecimal. Using Wikibooks:Unicode/Character reference, I then looked each of them up to determine what character, if any, had that number as a code point. Next, using Wordpad, I tried Alt+nnnn on each of the eight.

On two of the eight, the character was undefined according to Wikibooks. For both of them, Wordpad produced a ☐. One other, U+1BD7, is a "Batak letter northern ta"; Wikibooks could not produce a glyph but only ᯗ and Wordpad yielded a ⍰. For all of the others, the character that Wordpad called up matched that from Wikibooks.

I emphasize that the numbers were chosen randomly. While there may be a few exceptions, it appears that whenever Alt+nnnn yields a character in Wordpad other than ☐ or ⍰, the character is the one associated with it by Unicode. That's a lot of numbers. It certainly suggests that using the Alt code with a character's decimal code point is a pretty reliable way of producing that character.

Yes, Unicode input § Decimal input could use some improvement. The statement that

Microsoft Windows can input at least some Unicode code points using decimal typed on the numeric keypad by using Alt codes

is correct, though an understatement; Windows can input most code points that actually correspond to printable characters that way, at least with code points up to decimal 9999. It is necessary to input at least four digits, so a leading zero is needed for numbers less than 1000. The technique also doesn't work for Unicode control characters such as characters with decimal codes 0 –31 or 128 –159.

Peter Brown (talk) 18:29, 12 September 2020 (UTC)[reply]

Then it needs to be rewritten to state clearly that codepage 1252 creates invalid (to Unicode) binary values for characters that Microsoft has reassigned to the range 0080–009F and this makes documents that use them incomprehensible to other platforms.

For example, dagger and double-dagger, † and ‡, have the Unicode code points 2020₁₆ and 2021₁₆ (8224₁₀ and 8225₁₀) but CP1252 assigns them to 86₁₆ and 87₁₆ (134₁₀ and 135₁₀). Thus if a Windows user enters alt+0134, a dagger symbol will be displayed and printed on their Windows machine but the file thus created will be intelligible only to another user with Windows and CP1252. The reality is that the user has not created a Unicode code-point: indeed what they have encoded is not a valid character at all because it lies in the x80 to x9F 'reserved for control-codes' block.
But maybe not many people use dagger symbol, so how about the euro symbol, €? Its Unicode code point is 20AC₁₆ (8364₁₀) but Windows CP1252 assigns it to 80₁₆ (128₁₀₎). And perhaps your nicely formatted press-release also uses curly quotes? If your publicist uses a Mac or your typesetter uses a *nix system, then you just look illiterate or incompetent or both.

It also needs to say that it can't deliver characters with numbers above 255₁₀ (FF₁₆). So no Eastern European haceks or macrons, overdots, underdots, comma-below, let alone Greek or Cyrillic. (and the explanation needs to be written without confusing the numeracy-challenged with incomprehensible talk of modulo 255).

It also needs to say that if you are in Japan or China or India or Russia and so have an entirely different Windows code-page default, then your Alt+0nnn will produce something completely different. --John Maynard Friedman (talk) 22:28, 12 September 2020 (UTC)[reply]

Unicode input § Decimal input is indeed misleading, but not in the way suggested. It is not necessary that the decimal code point start with a zero; rather, as I noted in my previous post, "It is necessary to input at least four digits, so a leading zero is needed for numbers less than 1000." It is also necessary that code points less than 100 start with two leading zeros. The section is easily corrected to state the requirement correctly. No mention of CP1252 is necessary or even useful.

Unicode input is only concerned with methods to input characters given their Unicode code points. The dagger has a decimal code point of 8224, so a technique recommended by the article, when corrected, will be to enter Alt+8224. This works and, so far as I know, is independent of the code page. Yes, there is another technique, one relying on CP1252, but that in no way invalidates the technique, properly stated. Agreed, the user following the CP1252 procedure has not "created a Unicode code point" — code points are numbers, according to the Unicode standard and numbers are not created entities. Does U+0086 not encode a valid character? It's not a printable character, but it does lie within the subject matter of the Wikipedia Unicode control characters article, so there's certainly a case to be made for its being a character, specifically one designating "Start of Selected Area".

"How about the Euro Symbol €?", you ask. Same point: properly updated, Unicode input § Decimal input will tell us, correctly, that it can be produced by Alt+8364. Curly quotes? Alt+8216 through Alt+8223. Also macrons, such as the combining macron Alt+0304, which does have a leading zero. Greek and Cyrillic, such as α Alt+0945 and Д Alt+1044. And Japanese characters, like 侮, requiring five decimal digits: Alt+64048.

Peter Brown (talk) 02:19, 13 September 2020 (UTC)[reply]

Decimal input (Windows) Part 2

I bow to your more extensive knowledge and trust that you will clarify the article accordingly.

You say that the reference to CP1252 is not needed. So why is it that a user with Japanese layout gets something other than £ after typing Alt+0163? Does that not disprove your rule? 163₁₀ is certainly the correct Unicode value for the codepoint but Windows is delivering something from the 163rd slot in its Japanese code page which is definitely not £.--John Maynard Friedman (talk) 16:33, 13 September 2020 (UTC)[reply]

I've updated the article; please take a look at it. My claim is limited to Microsoft Word and Wordpad; it also works on LibreOffice Writer but not for Notepad, Chrome, or Firefox. What application is your Japanese friend using? Peter Brown (talk) 20:17, 13 September 2020 (UTC)[reply]

Said Japanese friend here. As discussed here I am indeed trying to produce £ in a plain-text context, such as Notepad, a text input box, or this Wiki editing area. When my 'keyboard' is set to Japanese (be it 'Japanese keyboard' or Microsoft IME - or indeed Chinese pinyin for that matter), Alt+0163 does not work (it produces ｣), and if I change to the Thai Kedmanee keyboard I get ฃ. If there were a 4- or even 5-digit code that worked (at one stage I had hopes for Alt+6556), that would be great, but what I currently see is that unless I switch the keyboard layout to e.g. UK or US and then use Alt+0163 (or Shift+3 in the UK keyboard), there is no simple way to input this Unicode character into such a text area. Ozaru (talk) 18:20, 14 September 2020 (UTC)[reply]

Of the applications you list, you're right: they provide no simple way to produce a £, at least none I know of. Of course, entering £ in the Wiki edit box will produce a £ in the resolved text, but that's not what you're after. Peter Brown (talk) 19:25, 14 September 2020 (UTC)[reply]

@Ozaru: Have you considered using a script language? I have an Autohotkey script that runs by default; I use it for em dashes among many other things. The Autohotkey script to make Cntl+F produce a £ would be just ^f::£. Peter Brown (talk) 00:37, 15 September 2020 (UTC)[reply]

There are plenty of workarounds (e.g. Windows+Space to switch to UK/US, Alt+0163 then Windows+⇧ Shift+Space to switch back; or phonetically entering ぽんど into the IME and hitting Space one or more times to select the right symbol, or Autohotkey as you say). The issue is more that despite the best intentions of moving from 8-bit SBCS to 16-bit DBCS and standardizing with Unicode while computers themselves become 32 and 64-bit... it still seems impossible to break free from the 8-bit codepage legacy, which I find incredible. It's amazing (not to say inconvenient) that even now, VBA Editor doesn't support Unicode, Excel can't save Unicode CSV files, and basic Windows 10 dialogs etc. don't have a simple, in-built way for Unicode input. So much for I18N. Ozaru (talk) 05:55, 15 September 2020 (UTC)[reply]

Couldn't put it better myself (I didn't!). As already noted, the £ glyph is just a random example, the issue is widespread. Which takes me back to my first challenge to the section. It is worse than misleading while it remains unqualified. --John Maynard Friedman (talk) 16:05, 15 September 2020 (UTC)[reply]

"... while it remains unqualified." Sorry, what is "it"? My revised wording begins, "Some programs running in Microsoft Windows, including Word and Wordpad ...". Isn't that sufficient qualification? I don't see a need, here, to mention that whether one can produce £, Ð, etc. on Notepad or VBA depends on the code page in effect. Peter Brown (talk) 19:05, 15 September 2020 (UTC)[reply]

"it" = "the text". The text that says that this method works when the real story is "it depends". Setting ever tighter parameters so that we can continue to say that it works is being "economical with the truth". We need to say that the method doesn't work reliably for keyboard settings outside the Americas, Western Europe, Southern Africa, A&NZ and (former) Western European colonies. IMO. --John Maynard Friedman (talk) 20:07, 15 September 2020 (UTC)[reply]

I think it can be stated this way:

In some cases Microsoft extended the Altcode inputs so that Unicode code points could be typed as decimal numbers.

For the numbers 0-256 the user had to type a leading zero (so that the "ANSI" code page was used) and also the ANSI code page had to be set to something that matched the first 256 characters of Unicode for all useful characters (CP1252).

For numbers greater than 256 there were numerous different results, depending on the software being used and the version of Windows:

The number had to be prefixed with a zero to work
At least 4 digits had to be typed (ie leading zero on n <= 999) to work.
The numbers did not work at all (usually producing the character for n modulus 256)
Numbers greater than 65535 might not work even if smaller numbers do.

Spitzak (talk) 20:58, 15 September 2020 (UTC)[reply]

Re Spitzak's four bullet points:

In Wordpad, Alt+960 and Alt+0960 both produce a π, which is the correct Unicode character. The high-order zero doesn't matter.
Same counterexample. Alt+960 works just fine.
960 ≡ 448 modulo 256, but in Word and Wordpad Alt+448 and Alt+0448 both produce, not π, but the glottal stop ǀ. Modulo 256 has nothing to do with it.

Peter M Brown edited his own comment to the above text, however his previous version makes mathematical sense: "960 ≡ 192 modulo 256, but in Word and Wordpad Alt+192 produces a └(per CP437) and Alt+0192 produces an À (per Unicode and CP1252). Modulo 256 has nothing to do with it." Basically the number 960 is irrelevant, the only interesting thing in the above statement is whether 448 turns into 448 or 192.Spitzak (talk) 19:42, 18 September 2020 (UTC)[reply]

Numbers greater than 62235 might not work? I've produced two cases of numbers that big that do work (one here and one in the article). Why is Spitzak so suspicious of the others?

I agree with John Maynard Friedman, above, that we should not confuse "the numeracy-challenged with incomprehensible talk of modulo 255," assuming that he really means 256. Spitzak evidently disagrees, as he has introduced such considerations into the article. However, Unicode input is, or should be, entirely concerned with Unicode input, with ways to produce characters when one knows their code points. Modulo 256, applicable to Notepad, outgoing Gmails, etc. could be discussed in the Alt code article, but it is not relevant here, because

discussion is limited to Word and Wordpad as well as similar programs like LibreOffice writer, and
for Unicode input purposes, the only point of knowing about equivalence modulo 256 (if it worked in Word etc.) is that, if one thought the number 666 accursed, one could produce the character ʚ using 154 or 410.

Peter Brown (talk) 01:47, 17 September 2020 (UTC)[reply]

I reverted your change to this talk because your edited version makes absolutely no sense. Nobody is suggesting any possible way that 960 will turn into 448, it will either turn into 960 or 192. Also your suggestion that modulus can go "backwards" and turn 154 into 666 is ludicrous (because 154, 410, 666, 922, 1178, ... are all possible answers and there is no reason to choose one of them, other than the first).

The non-bmp text I stuck in there because of older text claiming more than 4 digits might not work. I found it doubtful that 9999 is the cutoff and that it was typical Windows stupidity about non-BMP which starts after 65335. It sounds like there is no such cutoff, either with 4 digits or at some point that requires more than 4 digits, so all such text is removed.Spitzak (talk) 18:20, 18 September 2020 (UTC)[reply]

Spizak, there are no circumstances in which you should edit another editor's contribution unless it is a known troll. I strongly advise that you self-revert and apologize. If Peter raises an ANI, I would have to support him. --John Maynard Friedman (talk) 19:29, 18 September 2020 (UTC)[reply]

I'm reminded of the folk song "Green Grow the Rushes, O". Each verse ends with

"One is one and all alone and evermore shall be so."

One can never turn into two, nor can 960 turn into 192, despite Spizak's claim to the contrary. My claim that he thinks "makes absolutely no sense" is

"960 ≡ 448 modulo 256, but in Word and Wordpad Alt+448 and Alt+0448 both produce, not π, but the glottal stop ǀ."

He evidently did not read, or did not credit, my edit summary:

"See Modular arithmetic#Examples. The numbers on both sides of the ≡ symbol can be greater than the modulus."

The indicated section in Modular arithmetic begins:

"In modulus 12, one can assert that:

38\equiv 14{\pmod {12}}

because

38 - 14 = 24

, which is a multiple of 12. Another way to express this is to say that both 38 and 14 have the same remainder 2—when divided by 12."

Likewise 960 ≡ 448 (mod 256) because 960−448 = 512, which is a multiple of 256. Also, 960 and 448 have the same remainder, 192, when divided by 256.

Peter Brown (talk) 03:29, 19 September 2020 (UTC)[reply]

You are talking about all the numbers that are equivalent. I was talking about the modulo operator which returns the smallest of these numbers. In any case 960 ≡ 192 mod 256, and 960−192 = 768 = 3 × 256, so you have no reason to think 448 is more likely than 192. The weird thing is your example actually shows the correct characters you might get if you type 448 (either 448 or 192) but I still don't understand why you have 960 in that sentence. Just to prove this, lets ask Python what 960 mod 256 is, and make sure no 448 appears:

   >>> 960%256
   192

Spitzak (talk) 18:55, 19 September 2020 (UTC)[reply]

Of course I'm talking about equivalence. Why do you think I went to the trouble of generating the non-keyboard equivalence symbol ≡?

I have no reason to think that 448 is more likely than 192? Likelihood is relevant to indeterminate processes. We're doing math, not election forecasting.

As you noted, I switched from 192 to 448. I regarded brevity as a virtue and, with 448, I needed only to exhibit the one symbol ǀ rather than both └ and À.

Why did I use 960? I needed a number greater than 255, so a leading zero would make no difference, as a counterexample to your first and third bullet points. For the second point, it needed to be less than 1000. Finally, I thought it would be nice if it encoded a familiar non-Latin character and π, decimal code point 960, seemed a good choice because of its relevance to geometry.

Peter Brown (talk) 01:16, 20 September 2020 (UTC)[reply]

Sorry to keep this going, but I really think you have some misunderstanding of this, though I cannot figure out exactly what your confusion is, but I am just trying to be helpful and correct it. Basically either mod-256 is applied to the number typed in or it is not. This means that 960 either turns into 960 or 192, and can therefore produce either π or À. And it means that 448 can either turn into 448 or 192, and can therefore produce either ǀ or À. What you have shown is that in Wordpad, the first case (no modulus) applies, for both letters. But neither example has improved "brevity" over the other. And you seem to think that showing that another number that is equivalent to 192 also does not have modulus applied somehow enforces the idea that "modulus has nothing to do with it". Of course modulus has nothing to do with the case that modulus is not used. IMHO a better proof would be to use a number that is not equivalent (just in case somebody want's to claim that you have only proven that modulus is not applied only to numbers that are equivalent to 192 modulus 256).Spitzak (talk) 18:23, 20 September 2020 (UTC)[reply]

You continue to write of numbers turning into each other. I wrote above that "One can never turn into two, nor can 960 turn into 192, despite Spizak's claim to the contrary." Since you continue to write of numbers turning into each other, you must mean something by this locution, but I find it baffling. Likewise your talk of numbers "having modulus applied". Only someone who understands this concept could "claim that [I] have only proven that modulus is not applied only to numbers that are equivalent to 192 modulus 256". As I do not understand, I could not reply.

My statement that "modulus 256 has nothing to do with it" was perhaps too vague. The context was the production of characters using the Alt key in Word or Wordpad; I meant only that, within this context, the character produced does not depend on what characters are equivalent modulo 256 to the number entered.

Peter Brown (talk) 21:05, 20 September 2020 (UTC)[reply]

I am having a very hard time trying to figure out what you are thinking. The words "turns into" means: the user types the number 960, and the software eventually inserts a Unicode character with a certain code point, lets assume that for some reason this code point is 192. The input to this operation is the number 960, and the output is the number 192. I think it is extremely common to say "960 turns into 192" and am really curious why this term confuses you and how you would state it.

Using "turns into" your statement is "960 is equivalent to 448 modulus 256, and 448 turns into 448, not 192, therefore modulus has nothing to do with it". What you have shown is that modulus is not applied to 448. And the number 960 is completely irrelevant to this conclusion.

The other question is why you think using 448 instead of 960 somehow increases "brevity". My best guess is that you think the system might turn 960 into 448 and that you are avoiding difference between ANSI and OEM code pages? But then you correctly indicate that 448 turns into 448, not involving 960 at all, and even correctly identify the code point 448 would turn into if modulus 256 was applied (192, using the character from the ANSI code page). I am really trying to figure out your logic here. Perhaps you could write the "less brevity" version using 960 so I could get some idea of what in the world you are thinking?Spitzak (talk) 18:53, 21 September 2020 (UTC)[reply]

I find this use of "turns into" quite bizarre and I still don't really get it. According to you, "The input to this operation [typing a number] is the number 960 and the output is the number 192." No, in Word or Wordpad, the output is the character π, which has a decimal code point of 960; in Notepad or in the Wiki edit box, it's └, which has a decimal code point of 9592.

I don't know Python, but I do know Excel. It has a "mod" function of two variables, formatted "mod (a,b)", which returns the remainder from division, the least number r ≥ 0 such that a = nb + r for n ≥ 0. Perhaps, by "x turns into y" you mean that y = mod (x,256)? That would fit one of your examples, as 192 = mod (960,256). However, this interpretation doesn't fit your claim that 960 could (depending on what?) turn into 960, since 960 ≠ mod (960,256).

As regards brevity, surely it is briefer to display the one character ǀ rather to display two characters, └ and À.

Also, you have not explained what it is for a modulus to be "applied". Knowing what it is for paint or fertilizer to be applied does not get me very far.

Peter Brown (talk) 22:22, 21 September 2020 (UTC)[reply]

[1] "Pet peeve: empty .notdef character". TypeDrawers. 2018-05-07. Retrieved 2020-06-21.

[1]