Jump to content

Talk:C0 and C1 control codes

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

does anyone here have access to ISO/IEC-6429

[edit]

and if so can they check the codes in the C1 table (particularlly the 3 not identified by unicode) against it? Plugwash 02:34, 23 January 2006 (UTC) ECMA 48, the european version of this standard, is available online. --Random832 23:32, 1 July 2007 (UTC)[reply]

Supposedly ECMA-48 is identical (and is available for free). The ISO (and ANSI) documents all cost money. Tedickey (talk) 10:23, 10 March 2008 (UTC)[reply]

2024

[edit]
What are "the 3 not identified by Unicode"? The Unicode 15.1 version of the Unicode chart of C1 controls and Latin-1 Supplement, and the 1992 version of ISO/IEC 6429, have the same set of C1 controls, except that Unicode has 0x84 as IND and ISO/IEC 6429 doesn't, but, as the note attached to IND says, it was "Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-48)". I'll attach references in response to the "[citation needed]" for that.
Otherwise, the table matches both that version of Unicode and that version of ISO/IEC 6429. Guy Harris (talk) 09:56, 29 May 2024 (UTC)[reply]
0x80, 0x81, and 0x99. Search below for "Notes Regarding Omissions" Spitzak (talk) 18:52, 29 May 2024 (UTC)[reply]
OK, those aren't mentioned in ISO/IEC 6429 or ECMA-48, either; the notes in question say they were proposed for ISO 10646, but not accepted. Guy Harris (talk) 19:14, 29 May 2024 (UTC)[reply]

Is "String Terminator" abbreviated "SI"?

[edit]

Control code 0x9C is listed as:

0x9C SI ST String Terminator

However, SI is the abbreviation for:

0x0F SI Shift In

Is the SI in String Terminator supposed to be ST?

24.234.114.35 21:34, 4 May 2007 (UTC)[reply]

Fixed, source RFC 1345 says ST. --217.184.142.52 (talk) 19:52, 16 June 2008 (UTC)[reply]

C1 not derived from/used in ISO/IEC 8859-n

[edit]

The C1 codes were included in the ISO-8859-n series of encodings [...].

I think this is wrong if ISO-8859-n means ISO/IEC 8859. I only have access to draft versions of ISO/IEC 8859, but they explicitly say (C1 code points) use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429., see here. --Abdull 08:10, 8 June 2007 (UTC)[reply]

there is a subtule but important difference between ISO/IEC 8859-1 and the IANA charset ISO-8859-1. One is an incomplete standard without control codes the other adds them in to make a usable standard. Plugwash 21:42, 1 July 2007 (UTC)[reply]

2024

[edit]
The Unicode standard claims that code points 0x00 through 0xFF are inherited from ISO 8859-1 (not from any IANA character set), but the Unicode standard is making a false claim there; the non-draft ISO/IEC 8859-1:1998 explicitly declares all the control character code points to be out of its scope. I've updated the page to indicate where Unicode really got code points 0x00-0x1F, 0x7F, and 0x80-0x9F. Guy Harris (talk) 0:53, 29 May 2024‎
And Unicode doesn't describe what most of them do; see section 23.1 "Control Codes" of the Unicode 15.0 specification. Guy Harris (talk) 00:31, 30 May 2024 (UTC)[reply]
A reference to back up where Unicode got the code points from would be nice. DRMcCreedy (talk) 00:45, 30 May 2024 (UTC)[reply]
"Got [them] from" in what sense? Reserving 0x00-0x1F and 0x80-0x9F for the C0 and C1 control characters, respectively, came from ISO 2022. The semantics for the few control codes to which semantics are assigned, and the character name aliases, came from ISO 6429. C0 and C1 control codes § Unicode uses section 23.1 "Control Codes" of the Unicode specification as a reference. Guy Harris (talk) 01:06, 30 May 2024 (UTC)[reply]
The scope of ISO 8859 is, indeed, to define specific graphical character sets for use with level 1 of ISO 4873. ISO 4873, in turn, is a subset of ISO 2022. Hence, the concept of the C0 and C1 controls is defined by standards that ISO 8859 is designed to conform to. Notably, Unicode itself does not conform to either of those standards.
The relevance of ISO 8859 isn't that ISO 8859 itself defines anything to do with control codes (it doesn't, as you correctly observe), but that Unicode finished up with the C0 and C1 control codes (despite not itself being ISO-2022-based) on account of starting off by stipulating that existing data conforming to ISO 8859-1 (which would usually use some control code set, although it would still be conformant ISO 8859-1 if it didn't) should be mapped directly to U+0000–U+00FF. It so happens that Unicode continued to be used with a subset of the C0 set from ISO 6429 (i.e. using LF or CR+LF, as opposed to Unicode's own LSEP, as the end-of-line convention), and the likes of the Unicode Line Breaking Algorithm reflect this established practice.
It is certainly true that the control codes did not originate on account of ISO 8859, and that one would be unsuccessful trying to look for information about them in ISO 8859 itself.
--HarJIT (talk) 15:26, 1 June 2024 (UTC)[reply]
The 30 May 2024 edit removed the verbiage that I felt needed a reference ("Unicode inherits code points 0x00-0x1F and 0x80-0x9F from ISO/IEC 6429:1992") so my comment/request is now moot. DRMcCreedy (talk) 15:52, 1 June 2024 (UTC)[reply]

CUA stuff

[edit]

A few of the entries describe the use of a control key as a shortcut in many Windows programs and CUA X11 programs. For example: "In many programs, a keyboard input of Ctrl-Y is a "redo" command to undo the last Ctrl-Z undo command."

That's true, but the fact that Microsoft, when porting their Office software from the Mac to their own OS, used control keystrokes as a substitute for the missing command key has nothing to do with the meaning of any control character as a C0 control code.

Even if I'm completely wrong, I can't imagine how the undo/redo meanings of ^Z/^Y could be relevant but the clipboard meanings of ^X/^C/^V, the file command meanings of ^N/^O/^S, or the select-all meaning of ^A, the find-related meanings of ^F/^G/^R, etc. --75.36.140.83 07:36, 24 September 2007 (UTC)[reply]

That stuff appears to have been removed. Guy Harris (talk) 01:08, 30 May 2024 (UTC)[reply]

RFC 1345

[edit]

Do we really need to include the RFC 1345 acronymns? Aside from some limited usage in a UNIX utility, I haven't come across any evidence that they saw use elsewhere. Caerwine Caer’s whines 22:32, 16 June 2008 (UTC)[reply]

I'd tend to agree - though deciding whether to remove them would take some investigation Tedickey (talk) 00:43, 17 June 2008 (UTC)[reply]

Backspace

[edit]

The comments about backspace, and its linked topic do not mention its use for underlining and bold. The comment in the table is rather crowded, but rather than a blanket "deprecated", the point should be made that while composition of characters is not generally supported in terminals, the underline/bold generally are Tedickey (talk) 12:19, 19 June 2008 (UTC)[reply]


I think the description of Backspace is incorrect. This character have not different uses for input and output (the same way of CR or ESC characters, for example): it always move the cursor leftwards, so the phrase "To provide disambiguation between the two potential uses of backspace" have no sense.

A more precise description could be one in the same style of CR or ESC characters, for example:

Move the cursor one position leftwards. The Backspace key on a keyboard will send this character that is usually used to delete the character to the left of the cursor; to do that the three character sequence BS SPACE BS (0x08 0x20 0x08) is used. In early computer technology, where a character once printed could not be erased, the backspace was sometimes used to generate combinations of two characters, like à that could be produced using the three character sequence a BS ` (0x61 0x08 0x60), the method to print underline or overstrike characters combining _ or - with any character, or the standard method in APL programming language to create new operators combining two existing operators, like / BS - Aacini (talk) 05:35, 2 November 2008 (UTC)[reply]

agree Tedickey (talk) 18:44, 2 November 2008 (UTC)[reply]

This article is not about all control characters

[edit]

Just a friendly reminder. This article is not about every possible usage of a control character, nor even about usage on every system where 00HEX–1FHEX are control characters. This is about a specific set control characters, the C0 and C1 sets as defined by ISO/IEC 2022. Some of those meanings are generalized, so while instances where an application or system further defines their usage are relevant, a use which is totally unrelated to the character as defined in ISO/IEC 2022 belongs in either a separate article or in control character. Caerwine Caer’s whines 02:58, 12 July 2008 (UTC)[reply]

unclear lines

[edit]

The section C1 (ISO 8859 and Unicode) will become clearer if "if being used in an environment where 8-bit characters are not supported or where these octets are being used instead to add additional graphics characters" is removed. Also, I have passed a '+' outside the parentheses in a table column label. —Preceding unsigned comment added by 122.169.5.54 (talk) 08:46, 12 January 2010 (UTC)[reply]

The sentence could be broken up, but removing it would lose the hint for why 7-bit controls are useful. (Sending 2 bytes instead of 1 is not necessarily a good thing). Tedickey (talk) 09:33, 12 January 2010 (UTC)[reply]

C1 (ISO 8859 and Unicode)

[edit]

I renamed the heading "C1 (ISO 8859 and Unicode)" as "C1 set" since C1 is not defined in either ISO 8859 or Unicode. C0 and C1 can be used in ISO 8859 or Unicode text, but they don't define C0 or C1. — Preceding unsigned comment added by 88.112.175.168 (talk) 10:06, 27 September 2011 (UTC)[reply]

And so what is «C0 Controls and Basic Latin» and «C1 Controls and Latin-1 Supplement» in Unicode standard?
  1. http://www.unicode.org/charts/PDF/U0000.pdf
  2. http://www.unicode.org/charts/PDF/U0080.pdf — Preceding unsigned comment added by 84.97.14.22 (talk) 06:27, 19 July 2012 (UTC)[reply]
ECMA-35 and ECMA-48 define the use of C0/C1 for ISO-8859-1. Without a document such as that for Unicode (or UTF-8), all the documents that you have mentioned do is to show pictures of the codes that are mapped from ISO-8859-1; the C0/C1 behavior has not been specified. A reliable source on the matter would not leave leeway for guessing what might be meant TEDickey (talk) 08:16, 19 July 2012 (UTC)[reply]
I just want say Unicode standard
  1. recognize those values as control character,
  2. gives their range and aliases
  3. as character, implicitely attributes them a byte sequence depending on the UTF in use.
Might be you just want to say that Unicode does not specify the exact behavior of each control character.
Additionaly, a link can be established to Unicode control characters.
In The Unicode Standard, Version 6.1 page 23, they say: Basic Type control is «Usage defined by protocols or standards outside the Unicode Standard», and classifies them as category Cc with status abstract character.
And they add «Control Codes. Sixty-five code points (U+0000..U+001F and U+007F..U+009F) are defined specifically as control codes, for compatibility with the C0 and C1 control codes of the ISO/IEC 2022 framework. A few of these control codes are given specific interpretations by the Unicode Standard. (See Section 16.1, Control Codes.)»
§16.1 is in page 544 for C0.
In page 545 an additional semantic is clarified for at least eleven of them «Specification of Control Code Semantics» — Preceding unsigned comment added by 84.97.14.22 (talk) 11:18, 19 July 2012 (UTC)[reply]
But that's the point: the paragraph as written states that Unicode "provides" these codes, but it is in a context (and no clarification is made there) to point out that Unicode provides no definition of their behavior. The C1 codes without being translated would be illegal in UTF-8 encoding (because the values in 128-159 are continuation bytes). Without clarification, the paragraph is misleading. The word "provides" is inappropriate in this context - "assigns" would be more idiomatic, and corresponds to the sources you indicate TEDickey (talk) 22:32, 19 July 2012 (UTC)[reply]
C1 is not illegal in UTF-8. U+0085 (NEL / Next Line) is encoded as C2 85 in UTF8. I found this document which suggests that:
I don't know if that claim is true. But I tested a number of terminal emulators, and GNU Screen and Mosh were the only terminal emulators I tested that supported C2 85 as a newline character. --Hirsutism (talk) 21:07, 11 October 2012 (UTC)[reply]
Screen isn't a terminal emulator; nor is mosh - they're applications which use terminals and rely upon those to provide a lot of the functionality associated with a terminal emulator. TEDickey (talk) 21:31, 11 October 2012 (UTC)[reply]
Yes, Mosh does do terminal emulation. See here: "... the opportunity to build a clean UTF-8 terminal emulator from scratch ...". Mosh significantly reinterprets control characters and escape sequences, before sending them to the final terminal emulator. -Hirsutism (talk) 22:36, 11 October 2012 (UTC)[reply]
I'm aware of the opinion of its developer(s), but since it relies on the terminal (and ncurses) for the functionality, it's like screen - a translator which isn't a complete terminal emulator. You're not likely to find an authoritative source which agrees with that opinion. TEDickey (talk) 22:56, 11 October 2012 (UTC)[reply]
We're getting stuck in a side-tangent here. The precise definition of "terminal emulator" isn't important for this Wikipedia page. What matters here is: Putty + Mosh recognize NEL (encoded as C2 85) as a newline character. Even this empirical evidence is a side-tangent... the main discussion is about whether the Unicode spec fully recognizes NEL (or other C1 characters). --Hirsutism (talk) 15:28, 12 October 2012 (UTC)[reply]
Sure. But your suggested source isn't what one might term authoritative, due to several simple errors. For example, on the paragraph following the one you're interested in, he states

Since VT100 (that uses C1 extensively)...

which is incorrect. Scanning quickly, I see other errors. If you're simply stating that you can find someone agreeing with your point, that's easily done of course (google is your friend). TEDickey (talk) 23:03, 12 October 2012 (UTC)[reply]

Octal

[edit]

Would anyone object were we to add Octal to the table also? We already have decimal and hex. Maratrean (talk) 08:16, 29 October 2011 (UTC)[reply]

Octal is wonderful, but hasn't its time passed? An extra column would be quite confusing, so why add it? There are probably lots of people who really have no interest in octal, so I think a good reason for adding it would be needed. Johnuniq (talk) 09:10, 29 October 2011 (UTC)[reply]
I object too. Of course, octal is derived from hex (or decimal), so it would just be a dependent addition (deriveable). Of course one can add: so is decimal - all right. Only, decimal is used directly nowadays (e.g. when entering by keyboard). Someone else could argue: hey letys add UTF-8, UTF-16, and such. So I do object. -DePiep (talk) 22:14, 30 October 2011 (UTC)[reply]


The 'C' column includes many missing entries. In the language 'C' it is ordinary to use octal escape sequences to express and enter these missing entries. Why not fill out the missing entries in the C column in octal - such as '\003' - solves the OP, completes the column, and provides a reference to programmers wishing to use the control codes under discussion. — Preceding unsigned comment added by 92.21.236.161 (talk) 00:20, 5 February 2015 (UTC)[reply]

7F

[edit]

7F is delete. Which control code operates this? Kg pwn (talk) 22:55, 14 June 2012 (UTC)[reply]

In Unix, it's sometimes referred to as "Ctrl-?" or "^?"... AnonMoos (talk) 05:25, 15 June 2012 (UTC)[reply]
Yeah, but is it like... C2... or something — Preceding unsigned comment added by Kg pwn (talkcontribs) 19:25, 1 August 2012 (UTC)[reply]

Neither - ECMA-35 / ISO-2022 make SPACE and DELETE special cases (not control characters, and not a member of C0/C1). The positions used for those in the 128-255 range are printable characters, by the way. TEDickey (talk) 23:55, 1 August 2012 (UTC)[reply]

Restructuration

[edit]

I suggest to restructure this article, as is:

  • Principles
    (why control codes)
  • History
    (main dates)
  • Interoperability
    • Main standards interoperability issues
      utf-8, windows-1252, etc.
    • Main protocols and applications
      terminal, file text, unix, videotext, etc
  • Code assignations
    • C0 set
    • C1 set
  • Example of sequence using control code — Preceding unsigned comment added by 84.97.14.22 (talk) 17:25, 19 July 2012 (UTC)[reply]

Various standards

[edit]

http://www.itscj.ipsj.or.jp/ISO-IR/2-6.htm — Preceding unsigned comment added by 77.198.9.102 (talk) 23:21, 24 July 2012 (UTC)[reply]

[edit]

These links are all circular, or point to articles about usage of shortcut combinations on Windows, which has nothing to do with control codes. I recommend reverting the addition of them.Spitzak (talk) 05:20, 21 September 2013 (UTC)[reply]

I partially agree with your observation, but not with your conclusion.
I deliberately put the links in because semantically there is a difference between a control character given in notation ^X (specifies a key combination with Ctrl, not a specific function - associated functions are operating system and application specific), a control character given in notation \x (specific formatting to some programming languages), named control characters distinguished by function (Linefeed, Tabulator, Bell, Null) or named control characters distinguished by code (NUL, ETX, etc.) in specific standards like ASCII etc.
While not being circular, at present some of the links have the same target (which often does not reflect above semantics correctly), but this is a problem of sub-optimal target linking in redirects rather than a problem of adding local links to the terms as is. We will have to retarget some redirects and restructure some articles to create semantically more correct link targets, but this won't happen overnight. However, we will create awareness for this "unevenness" only by starting to incorporate the links - over time, this will create a momentum which will help to shift the targets to be more semantically correct. If we don't add the links, neither the semantically differences nor the structure will become apparent to most users, so changes in this area would happen only randomly and without a clear direction rather than systematically following some overall structure.
--Matthiaspaul (talk) 11:12, 21 September 2013 (UTC)[reply]
The ^X notation actually indicates the character with the value of an ASCII 'X' xor'd with 0x40. Although often the same it is not a symbol for the key sequence. For instance ^@ means a character that is more likely produced by typing ctrl+space. In any case I think links leading to discussion of Windows shortcuts are wrong, these shortcuts are processed directly from keyboard input and at no point is a C0/C1 control code ever used.Spitzak (talk) 01:52, 29 May 2014 (UTC)[reply]

Purpose

[edit]

What this article doesn't really make clear is why C0 and C1 are in Unicode. The use of U+2400 ... U+243F is immediately obvious, and I guess it makes some sense to reserve NUL, TAB, CR and LF.

But what are you supposed to do when you encounter SI? Obviously you aren't meant to switch to a different character set, because if people wanted to encode a character not in Unicode they'd use a PUA character. Maybe it's part of a quoted string of bytes to send to some machine for which SI does make sense? No, because then you'd use the visual representation ␏.

If you find BEL, are you supposed to sound a bell? Of course not. A Unicode text is just that, text, not a string of instructions to do something. Even when displayed, it tends to be scrollable and no bell moment exists. And you wouldn't want to allow text to ring bells anyway. Again, for quoted bytes there's the visual representation.

What about SOH? Again, meaningless in text unless quoted. Most of these control codes are useless as part of text. Insofar as they make sense at all, it's as formatting, which isn't within the Unicode scope, but within things like HTML and CSS, or whatever format your word processor uses. The only reason it makes sense to reserve NUL, TAB, CR and LF is the sheer ubiquity of simple file formats (we call them text files, but they do contain formatting in addition to text) and memory representations of strings that need these.

So the question is, what is the purpose of the C0 and C1 control codes? — Preceding unsigned comment added by 82.139.81.0 (talk) 18:44, 28 May 2014 (UTC)[reply]

They're in Unicode to preserve compatibility with ASCII etc. character sets. AnonMoos (talk) 03:36, 7 February 2015 (UTC)[reply]
C1 comes from ISO-6429 (aka EMCA-48), and ISO-2022 (aka ECMA-35). It is not so much for compatibility (since the Unicode standard merely lists the names without attempting to describe functionality) as because ISO10646 grew out of the standardization work for the older encodings. Because Unicode does not describe functionality, it does not standardize C0/C1, merely makes a few assumptions relying upon those other documents as the relevant standards TEDickey (talk) 12:05, 7 February 2015 (UTC)[reply]

sources discussing smtp rather than ISO 10646

[edit]

The given sources are discussing smtp rather ISO 10646 as such:

The following is a draft for an RFC updating SMTP to allow and encourage use of ISO 10646 (now DIS, of course).

and without a more suitable supplementary source, the statements do not match the source TEDickey (talk) 23:55, 7 April 2015 (UTC)[reply]

If you read this paragraph:
In Internet messages, the dynamic compaction method (compaction method 5) is used, the initial state being G=32, P=32, R=32, with each octet specifying a value of C. (Translated into normal English, that sentence means: "The text is in 8-bit Latin-1 until we get to the first HOP, if any!") Transitions to other character sets, represented by rows and, in some cases, planes, is done with a sequence that begins with the HOP ("High Octet Preset") code (decimal 129). The SGCI ("Single Graphic Character Introducer") is not used (i.e. we use "level 1" of method 5).
It's pretty clear to me it is discussing how the ISO 10646 draft is applied to SMTP. It's not introducing HOP or SGCI itself, it is pulling them from the draft. It would be great if someone could find old ISO 10646 drafts and we could quote them instead, but even in the absence of copies of those old drafts, I don't think there is any other plausible interpretation of this paragraph. SJK (talk) 12:23, 9 April 2015 (UTC)[reply]

Without the said draft, you cannot distinguish the interpretation which you wish to make from an equally plausible one that refers to some ISO-2022 feature which is commented upon as not being in ISO 10646. As such, your commentary in the topic amounts to original research. As I said, you need a supplementary source to provide the information rather than interpreting TEDickey (talk) 00:43, 10 April 2015 (UTC)[reply]

Please see Ken Whistler, Formal Name Aliases for Control Characters, L2/11-281, Unicode Consortium, July 20, 2011, which explains the situation much better than my previous reference did:

Notes Regarding Omissions

I have deliberately omitted three control code names and their abbreviations
which occur in one (obsolete) RFC, but which are an artifact of early
unapproved drafts of 10646. To wit:

0080 PADDING CHARACTER (PAD)
0081 HIGH OCTET PRESET (HOP)
0099 SINGLE GRAPHIC CHARACTER INTRODUCER (SGC)

Those 3 were proposed (on spec) in early drafts of 10646, for what became
a failed architectural direction for 10646. They would be completely forgotten
now except for the persistent (and pernicious) RFC that lists them without
indicating their failed status. Nobody has ever implemented them, so they
are nothing more than character encoding curiosities.

So this reference justifies my inference as correct. I will replace my prior reference with this one. SJK (talk) 10:52, 10 April 2015 (UTC)[reply]

Missing information

[edit]

These control codes had names in Unicode 1.0 but these names were later removed. The article should explain when and why.

10646-1 forbids the use of C1 controls, requiring an ESC FE sequence instead. The article should detail when and why this came about and whether or not it is still in force in Unicode. — Preceding unsigned comment added by 82.139.82.82 (talk) 03:22, 6 September 2015 (UTC)[reply]

That (ESC Fe) was made obsolete a long time ago, and removed. See this for example. TEDickey (talk) 12:55, 6 September 2015 (UTC)[reply]

merge vs deletion

[edit]

While it's interesting that Unicode has a subset of C0/C1 codes, deleting most of the content of this topic to replace it by a redirect to a summary paragraph should have some discussion involving the editors who've been maintaining the page. TEDickey (talk) 08:28, 4 August 2016 (UTC)[reply]

C1 control pictures

[edit]

Why are there no C1 control pictures in the UCS? 1234qwer1234qwer4 (talk) 15:19, 2 June 2019 (UTC)[reply]

For instance this? Likely disinterest on the part of the committee members who were not involved in software development TEDickey (talk) 16:25, 2 June 2019 (UTC)[reply]
The Unicode Public General Mail List is probably a better place to ask this question. Google "c1 control pictures" site:unicode.org to see the discussions that have already taken place. If your question is "Why do C0 controls get pictures but not C1 controls?" then the short answer is compatibility with a legacy encoding that had C0 control pictures. DRMcCreedy (talk) 16:31, 2 June 2019 (UTC)[reply]
Actually, asking on a mailing list can get mixed results. If I wanted to know, I'd ask Frank. Either way, unless someone points to a mail-archive discussing the relevant issues, the best you'd get would be a primary source (unsuitable for topic development). TEDickey (talk) 19:15, 2 June 2019 (UTC)[reply]

What does C0 and C1 mean? Where did it came from? Are there also C2, C3? or did these exist?

[edit]

I'd like to see the article explain the origin of the terms "C0" and "C1" and answers all these questions. --RokerHRO (talk) 16:25, 14 April 2020 (UTC)[reply]

See C0 and C1 control codes § C1 controls:

In 1973, ECMA-35 and ISO 2022[1] attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa.[2] In a 7-bit environment, the Shift Out (SO) would change the meaning of the 96 bytes 0x20 through 0x7F[a][4] (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in a 7-bit environment,[2] thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent.[2] The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

There are only C0 and C1, but ECMA-35/ISO 2022 allow selection of four graphic code sets, G0 through G3, with G0 being the ASCII graphic characters by default.-- 03:05, 29 May 2024 Guy Harris

Notes

  1. ^ In early versions the range excluded SP and DEL[3]

References

  1. ^ ECMA/TC 1 (1973). "Brief History". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  2. ^ a b c ECMA/TC 1 (1971). "8.2: Correspondence between the 7-bit Code and an 8-bit Code". Extension of the 7-bit Coded Character Set (PDF) (1st ed.). ECMA. pp. 21–24. ECMA-35:1971.{{citation}}: CS1 maint: numeric names: authors list (link)
  3. ^ ECMA/TC 1 (1973). "4.2: Specific Control Characters". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. p. 16. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  4. ^ ECMA/TC 1 (1985). "5.3.8: Sets of 96 graphic characters". Code Extension Techniques (PDF) (4th ed.). ECMA. pp. 17–18. ECMA-35:1985.{{citation}}: CS1 maint: numeric names: authors list (link)

JSON_streaming#Record_separator-delimited_JSON

[edit]

I'd like to add a link to JSON streaming#Record separator-delimited JSON but I am unsure where it would fit best. --RokerHRO (talk) 22:40, 5 March 2021 (UTC)[reply]

Perhaps in the rightmost column of the table in C0 and C1 control codes#Basic ASCII control codes - there's a big box for FS/GS/RS/US, mentioning various uses of those control characters. Guy Harris (talk) 22:59, 5 March 2021 (UTC)[reply]

State machines

[edit]

This text in C0 codes is certainly anachronistic and arguably simply wrong:

  • This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals

State machines per se were neither difficult nor expensive. Shift states were required for existing coding systems such as BAUDOT, and were significantly less complex than the shift registers already needed for sending and receiving serial communication.

A state machine that could interpret VT-100 style escape sequences however would have been prohibitive in 1964.

The prime reason for avoiding shift states (or state machines in general) was to cope better with unreliable transmission, though I don't have a citation for that.

To describe 32 as a "large number" is laughable compared with the hundreds of controls that are implemented as sequences of bytes by typical terminal emulators.

Bitwise interpretation of ASCII codes
Maybe this table might be useful in an article, once we've figured out which article
bits meaning
0000000
1111111
no action; ignored
00_____ controls
__00___ Transmission controls, affecting DCEs
__01___ layout controls, driving the motors in printers
__10___ Terminal controls, including shift states and device-specific functions
__11___ File format markers
01_____ Digits & punctuation
1______ Letters
_0_____ Upper-case
_1_____ Lower-case

Although ASCII was designed as a coding system for transmission, unlike previous coding systems it could also function as an encoding for computation, with each printable character fitting into a single machine word ("byte", as we would know it today). This meant that there were needed to be in excess of 64 codes, dictating a minimum of 7 bits.

As only around 80-90 graphic characters were envisaged, it would have seemed foolhardy to "skim" on control codes; clearly at least 16 would be useful.

As there are broadly 4 classes of control codes, and a need for at least 5 transmission controls and 6 format controls, it made logical sense to reserve 4 groups of 8 codes, or 32 in all.

The eventual ASCII standard included codes that deviated from this simple arrangement, but this initial framework is still plain to see.

Martin Kealey (talk) 03:04, 13 August 2022 (UTC)[reply]

Space is not a motion control character

[edit]

It is a whitespace character. Which on computers is a normal character like a or z.

Moving right is a completely different action that does not create a character or change a text string. If it is not caught and handled by the input-handling program, then at best it is mapped to other characters and displayed in a safe way, and at worst, will mess up the terminal.

It seems, many younger people and computer illiterates misunderstand the space character in severe ways that are harmful to everyone, because they still think in terms of writing on paper.

Please do not spread that, and keep it to people who print out the Internet and are used by iDevices.

-- 2A02:3035:610:58B8:24DA:BC5F:806D:B752 (talk) 21:08, 28 May 2024 (UTC)[reply]

I presume you're referring here to the entry in the first table, with SP described as "[Moving] right one character position."
Many older people remember printing terminals, in which the space character moved the print head one position to the right, and changes nothing on the paper. Those were the majority of terminals when ASCII was developed. Page 6 of the 1963 ASCII spec speaks of the character in the 0x20 position as "Word separator [space, normally non-printing]". :It does, however, refer to it as a graphic character on page 11.
The 1968 version also describes space as "normally non-printing", but puts it in the "Graphic Characters" section rather than the "Control Characters" section. It says that is

A normally non-printing graphic character used to separate words. It is also a format effector which controls the movement of the printing position, one printing position forward. (Applicable also to display devices.)

but does not specify in what fashion it's "applicable ... to display devices".
Display terminals usually erased the character at the current display position, and moved one position to the right, when they received a space character, and most if not all terminal emulator programs emulate terminals of that sort. (The Datapoint 3300, however, appears to have supported both "space overwrite" and non-"space overwrite" behavior, perhaps because it was intended to be a replacement for ASCII Teletypes such as the Teletype Model 33, so, while it probably didn't support full overprinting, you could at least overwrite a space with another character.[1])
The right thing to do would probably be to expand "Move right one character position." to something such as "Move right one character position; on display terminals, this usually erases the character at the current character position." Guy Harris (talk) 23:27, 28 May 2024 (UTC)[reply]

References

Unclear passage

[edit]

WRT "This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals."

What does "at the time" imply? What time? When originally specified?

What does this mean: "multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals" ... state machine? very difficult? contemporary? The info may be correct and may be good. But, it's not clear what it means IMO. And/or there are missing words/ideas. Stevebroshar (talk) 18:59, 28 April 2025 (UTC)[reply]

Your recent edit made this point more obscure. I suggest you find suitable reliable sources to support your edits TEDickey (talk) 20:28, 28 April 2025 (UTC)[reply]
If you're referring to this edit, what it did was 1) removed some stuff about the DEL character being necessary because there needed to be a character to punch out the 7 data bits in a tape to erase a character and 2) combined the paragraph User:Stevebroshar asked about with the previous paragraph without making any changes to its text, so I 1) don't see how it made the point more obscure except for combining it with another paragraph and thus possibly making it stand out less and 2) don't see that the new text requires any more references than the previous text did (which it arguably did).
As for the original questions:
What time? When originally specified? Yes, the time when ASCII was specified as having 32 control characters.
state machine? Yes, a state machine, e.g. if <ESC>A means "perform operation X", the terminal would need some mechanism, such as a state machine, to cause it to treat an 'A' following an <ESC> as meaning "perform operation X" rather than "print an 'A'".
very difficult? I'll leave it up to an electronic designer to say how much effort it'd take to build it from the individual transistors available at the time.
contemporary? 1961-1967. Guy Harris (talk) 07:46, 25 May 2025 (UTC)[reply]

"Reliable sources"

[edit]

User:Tedickey keeps adding [better source needed] to the VOS Administration Manual I added as a source. He says it is "user-generated content" and implied that is hasn't been "published"??? (I don't know what he means by this) User TEDickey is very, very unclear about what his objection actually is. He keeps referring me to WP:RS, which I have read. However, based on a comment on my talk page I intuit (because he doesn't say what he means) that he seems to dislike the scribd URL, even though the URL is only for convenience, and the document is what it is regardless of whether it's even accessible online at all, at any URL.

Again, a product manual is an acceptable source for simple facts about a product. It is no more objectionable as a primary source than a standard such as one from ISO or ECMA, plenty of which are cited throughout Wikipedia.

For what it's worth, I changed the URL to point to a page directly on the manufacturer's website, which has a newer version of the manual. 204.225.215.56 (talk) 01:03, 28 May 2025 (UTC)[reply]

The issue is with the hosting website, scribd is a place where users can upload pretty much whatever they want, hence user-generated content and could be removed at any time for whatever reason. A link to the manual on the official website is best, which is seems that you have now done.
That said, @Tedickey:, you need to be clear with what are telling people, you could have very easily explained what I did, rather then referring them to WP:RS over and over again and not telling them WHY. LakesideMinersCome Talk To Me! 01:10, 28 May 2025 (UTC)[reply]
That is indeed clear. However, I don't understand why the hosting site is an issue at all, if the document itself is reliable/valid. Yes, it can be removed, just like any link on the internet might die. And anyone can upload to the host, but that's like saying anyone can sell anything they want on eBay, so a book bought on eBay is automatically unreliable because anyone could have authored it. It's the original document that's the source, not any particular URL that might be hosting it. 204.225.215.56 (talk) 01:22, 28 May 2025 (UTC)[reply]
It's the fact that it's more likely for that link to become dead then say a link to the manufacturer site itself. LakesideMinersCome Talk To Me! 02:17, 28 May 2025 (UTC)[reply]


  • Good Morning, I'm here from WP:3O. Having reviewed the discussion, I think the crux of the issue is that we would prefer to link to the original publisher or a reputable publisher to establish provenance. User:LakesideMiners is correct that Scribd doesn't fact check or review anything posted on it so it should be used with caution. It is useful as a non-paygated source though and it's url can be appended to the citation template for that purpose, but the original publication source should still be linked. I wouldn't worry, with either source, on the link becoming dead, the archive link bot here does a pretty good job catching that, and you could always use the archive URL from the Wayback Machine if that is a particular concern.
Hope this helps. Squatch347 (talk) 12:22, 30 May 2025 (UTC)[reply]
The majority of the disagreement in is the next section. 204.225.215.56 (talk) 12:47, 30 May 2025 (UTC)[reply]

Stratus use of SS1-whatever

[edit]

The recent edit adds a pointer to a table using those names, which are not explained in the source. Their relationship to this topic needs some explanation past an editor's original research efforts TEDickey (talk) 07:30, 28 May 2025 (UTC)[reply]

No. There is no further explanation necessary. None of the other mentions of alternative sets describe how the control functions actually work (except CEX). There is no reason to go into that level of detail. Do you think the control names should not be mentioned at all, unless they have an associated description in the article? I included them because because they took up very little space, but if you insist on removing them, fine.
I don't want another ride on this carousel. Please state clearly what your objection actually is, and what you would see as an acceptable resolution. Right now you are applying standards to my contribution that you're not applying to any other part of this article, for reasons you seem determined to obfuscate. 204.225.215.56 (talk) 16:39, 28 May 2025 (UTC)[reply]
Sure: provide a source which explains what "single-shift 15" happens to be. If you haven't a source, then the sentence as it stands is paraphrasing the table's contents without providing the reader any useful information. Based on that single source given, the paragraph doesn't really provide any information that you couldn't done equally well with a single sentence. From reading the guidelines, you may have understood that Wikipedia is not a source of knowledge, but rather a synopsis of sourced information. Repeating all of the interesting tidbits from the table without further explanation doesn't follow the guidelines. TEDickey (talk) 18:58, 28 May 2025 (UTC)[reply]
"the sentence as it stands is paraphrasing the table's contents without providing the reader any useful information." It provides a short summary of what the table says, in the context of the article. The level of summary is on the level of those that discuss replacing SS2, SS3 and FS in the other descriptions. I don't see the objection you're making that doesn't equally apply to the other items in the list.
"the paragraph doesn't really provide any information that you couldn't done equally well with a single sentence" So you want the description shorter? Is that the only issue?
"a synopsis of sourced information." That's what the description is. If you think it's too long as a synopsis, then please just say so.
"Sure: provide a source which explains what "single-shift 15" happens to be." That is also in the manual. However, describing its functionality from a primary source does seem like it might veer into (actual!) original research. Here it is In fact we can easily glean from the description that it's meant to be used in an ISO/IEC 2022 context where the single-shift area is GR, not GL. But that is too much explanation, which is why I didn't include it, to keep the synopsis at the appropriate level of detail.
Do you object to adding that link as another source?
Would you be satisfied with reducing the description to something like "The Startus VOS operating system uses its own proprietary C1 set"?
Do you just want the mention removed altogether? 204.225.215.56 (talk) 19:22, 28 May 2025 (UTC)[reply]
Dropping these in here for future reference, especially since the Stratus doc pages are non-trivial to get direct links for:
Stratus refers to the code points 0x80-0x9f as "Stratus-specific" control characters (non-printing characters), in case that was ever in question.
A description of the operation of single-shifts in VOS strings (which work exactly like ISO/IEC 2022 except there are more than 4 graphic sets)
204.225.215.56 (talk) 20:50, 28 May 2025 (UTC)[reply]
In that last link, I see only SS1 and SS2, without any mention of the (presumably non-standard) shifts. Providing a reliable source for the pasted/copied table entries is what is being requested. Without that, there's nothing to discuss TEDickey (talk) 17:50, 29 May 2025 (UTC)[reply]
SS1 is a non-standard shift. Do you expect a thorough explanation of every single control? There isn't one, and there isn't going to be one.
"Providing a reliable source for the pasted/copied table entries" What pasted/copied table entries? If you mean the literal names of the controls, then no, that does not require a secondary source. Again, product manuals are acceptable sources for simple factual statements about a product. The entry is literally quoting the names of items listed by an instruction manual.
You keep refusing to answer what would make the entry acceptable. I ask again: Would removing the names of the individual controls be acceptable to you?
At this point I have to accuse you of bad faith. You are not cooperating in any manner in fixing the entry. You are being obstructionist to no purpose. You have refused to answer simple questions that might help in resolving the matter. You refuse any changes that I have offered, and failed to offer any yourself. I don't know what you expect. You keep asking for "secondary sources" for a statement that does not require them (and are unlikely to exist).
Do you want the contribution removed? Yes/no? 204.225.215.56 (talk) 21:44, 29 May 2025 (UTC)[reply]
I have requested a third opinion. My position is that a secondary source is not necessary for a simple statement about the VOS control set included in the VOS manual. There is no original research here. There is no interpretation. As for secondary sources, I don't believe any exist on this highly specific topic, so if that is necessary, the entry should be removed outright. However, the level of demand being applied to this entry specifically is inconsistent with similar information in this and many related articles. 204.225.215.56 (talk) 22:00, 29 May 2025 (UTC)[reply]

For what it's worth to anyone here, here's the actual list of what the individual single-shift characters and locking-shift introducer do in Stratus OpenVOS, and here's a bit more elaboration on the identities of the G-sets in question. To summarise, they appear to use some eight fixed G-set designations, and allow invoking them over the GR area using single-shifts or locking-shifts:

  • LSI: 0x90 followed by (single shift code + 0x20) accesses the locking-shift codes. Note that despite that page listing them as e.g. LS1, they actually invoke over the GR area, per elsewhere in the Stratus documentation (Stratus also makes the single-shifts invoke over the GR area like in EUC-JP, although the individual G-set designations do not correspond to EUC-JP).
  • SS1: "Latin alphabet No. 1" (i.e. ISO-8859-1 right-hand side)
  • SS2: "Kanji" (JIS C 6226 a.k.a. JIS X 0208)
  • SS3: "Katakana" (JIS C 6220 a.k.a. JIS X 0201)
  • SS4: "Hangul" (KS C 5601 a.k.a. KS X 1001)
  • SS5: "Simplified Chinese" (GB 2312)
  • SS6 and SS7: "Chinese" (presumably means Traditional Chinese); the documentation cites this as "Chinese Central Bureau of Standards (CBS) General Hantz Standard (GHS) 3/86 (2 volumes)", whatever that means. That said, the documentation notes that "Note that Chinese is defined as one standard but occupies two character sets.", which sounds a lot like the original 1986 version of CNS 11643.
  • SS8: "User-defined" (i.e. private-use area)
  • SS9 thru SS15: apparently unused?

--HarJIT (talk) 22:15, 29 May 2025 (UTC)[reply]

Sure - that's helpful. The editor (for whatever reason) appears to be fixated on just copying the names from the table without providing any of the supplementary information needed to make the contribution encyclopedic. Feel free to guide the editor toward that goal TEDickey (talk) 22:33, 29 May 2025 (UTC)[reply]
I have asked multiple times what change to the text would satisfy you. How am I "fixated"? I offered to reduce the text, because you were complaining about including too much info and accusing it of being original research. While HarJIT's addition is interesting, it's still the same primary source, which you have been consistently dissatisfied with. I can include all that, if you want -- but you keep refusing to say what you want, except insisting on secondary sources (but only of me, apparently) which don't exist. 204.225.215.56 (talk) 00:03, 30 May 2025 (UTC)[reply]
  • Hi, I found this via WP:3O,but it looks like a third editor has appeared. I think it might be helpful to remember the wiki policy of assuming good faith. With that said, I've read through the article and talk and (this is not my area of expertise) don't really see what is being objected to. TEDickey, can you propose alternative text to that addition that might make it more encyclopedic? Squatch347 (talk) 12:44, 30 May 2025 (UTC)[reply]
    The existing paragraph has a single source (referring to a table), listing names of controls which Stratus has assigned in the range 128-159 (i.e., "C1"). The list of names is not helpful, and that sentence could be improved by eliminating the list (keeping the reference to the table) and adding a sentence indicating that some of the assignments deal with Stratus's use of shifts (which differs from ISO-2022), giving 1-2 references to their manual where they explain their shifts, and then (if there is an explanation in the manual to the "introducer" control, summarize what they do, providing the reference to the manual. Without that summary and pointer to the source of information, the paragraph is not helpful to readers TEDickey (talk) 19:52, 30 May 2025 (UTC)[reply]
    Ok, that makes sense I think. How would you write it to provide a better context? Squatch347 (talk) 20:12, 30 May 2025 (UTC)[reply]
    I removed the list of control names. I can add the extra detail TEDickey requests which, by necessity, includes the control names again. I'm quite frustrated by this: is the objection that there is too much information, or not enough? Am I to quote more, or less, from the manual? I can't do both. As he has objected to my using a primary source, and accused me of "original research", the statement that more references to the same manual will work feels like a trap. I also disagree that the simple list was "not helpful". It was at the level of detail that I find most naturally helpful; listing points of interest without going into a full-on technical description of a very niche aspect on what is a much more general article. 204.225.215.56 (talk) 20:32, 30 May 2025 (UTC)[reply]
    I added detail for the SS controls. Is that cited properly? (No, thanks, I don't need yet another condescending link to WP:RS.) It's primary sources all around, because secondary sources are most likely nonexistent. Is the level of detail OK? IMO it's too much, compared to the other entries in this list, but this is what was requested. The introducer controls can be filled by someone else. 204.225.215.56 (talk) 21:34, 30 May 2025 (UTC)[reply]
    The citations look correct. TEDickey (talk) 22:06, 30 May 2025 (UTC)[reply]
    This has been an unjustifiably frustrating experience. Communicating badly and acting smug when you're misunderstood is not cleverness. Other editors (even unregistered ones) are not dogs to be trained to do tricks to your satisfaction. Seniority is not wisdom. It's especially bad when a user such as @TEDickey, with such a long history on this site behaves like this. If you want to chase away newbies, this is how it's done. At no point did TEDickey actually work to improve the added passage. All his contributions seem to consist of reverts and (in my case) repeatedly marking my contributions as faulty, without any productive input to fix the perceived faults. I note his behavior, and clarity of communication, improves the instant he's conversing with someone other than me. He ignored the majority of my questions but dealt normally with everyone else. His condescension (including repeated links to pages I've read, and saying I should be "guided") cannot be good faith.
    I believe I'm owed a proper apology. Until I have it, I hope I never encounter this user again. 204.225.215.56 (talk) 23:30, 30 May 2025 (UTC)[reply]