ASCII misc (The recode reference manual)

8 ASCII and some derivatives

8.1 Usual ASCII

This charset is available in recode under the name ASCII. In fact, it’s true name is ANSI_X3.4-1968 as per RFC 1345, accepted aliases being ANSI_X3.4-1986, ASCII, IBM367, ISO646-US, ISO_646.irv:1991, US-ASCII, cp367, iso-ir-6 and us. The shortest way of specifying it in recode is us.

This documentation used to include ASCII tables. They have been removed since the recode program can now recreate these easily:

recode -lf us                   for commented ASCII
recode -ld us                   for concise decimal table
recode -lo us                   for concise octal table
recode -lh us                   for concise hexadecimal table

8.2 ASCII extended by Latin Alphabets

There are many Latin charsets. The following has been written by Tim Lasko lasko@video.dec.com, a long while ago:

ISO Latin-1, or more completely ISO Latin Alphabet No 1, is now an international standard as of February 1987 (IS 8859, Part 1). For those American USEnet’rs that care, the 8-bit ASCII standard, which is essentially the same code, is going through the final administrative processes prior to publication. ISO Latin-1 (IS 8859/1) is actually one of an entire family of eight-bit one-byte character sets, all having ASCII on the left hand side, and with varying repertoires on the right hand side:

Latin Alphabet No 1 (caters to Western Europe - now approved).
Latin Alphabet No 2 (caters to Eastern Europe - now approved).
Latin Alphabet No 3 (caters to SE Europe + others - in draft ballot).
Latin Alphabet No 4 (caters to Northern Europe - in draft ballot).
Latin-Cyrillic alphabet (right half all Cyrillic - processing currently suspended pending USSR input).
Latin-Arabic alphabet (right half all Arabic - now approved).
Latin-Greek alphabet (right half Greek + symbols - in draft ballot).
Latin-Hebrew alphabet (right half Hebrew + symbols - proposed).

The ISO Latin Alphabet 1 is available as a charset in recode under the name Latin-1. In fact, it’s true name is ISO_8859-1:1987 as per RFC 1345, accepted aliases being CP819, IBM819, ISO-8859-1, ISO_8859-1, iso-ir-100, l1 and Latin-1. The shortest way of specifying it in recode is l1.

It is an eight-bit code which coincides with ASCII for the lower half. This documentation used to include Latin-1 tables. They have been removed since the recode program can now recreate these easily:

recode -lf l1                   for commented ISO Latin-1
recode -ld l1                   for concise decimal table
recode -lo l1                   for concise octal table
recode -lh l1                   for concise hexadecimal table

8.3 ASCII 7-bits, `BS` to overstrike

This charset is available in recode under the name ASCII-BS, with BS as an acceptable alias.

The file is straight ASCII, seven bits only. According to the definition of ASCII, diacritics are applied by a sequence of three characters: the letter, one BS, the diacritic mark. We deviate slightly from this by exchanging the diacritic mark and the letter so, on a screen device, the diacritic will disappear and let the letter alone. At recognition time, both methods are acceptable.

The French quotes are coded by the sequences: < BS " or " BS < for the opening quote and > BS " or " BS > for the closing quote. This artificial convention was inherited in straight ASCII-BS from habits around Bang-Bang entry, and is not well known. But we decided to stick to it so that ASCII-BS charset will not lose French quotes.

The ASCII-BS charset is independent of ASCII, and different. The following examples demonstrate this, knowing at advance that ‘!2’ is the Bang-Bang way of representing an e with an acute accent. Compare:

% echo \!2 | recode -v bang..l1/d
Request: Bang-Bang..ISO-8859-1/Decimal-1
233,  10

with:

% echo \!2 | recode -v bang..bs/d
Request: Bang-Bang..ISO-8859-1..ASCII-BS/Decimal-1
 39,   8, 101,  10

In the first case, the e with an acute accent is merely transmitted by the Latin-1..ASCII mapping, not having a special recoding rule for it. In the Latin-1..ASCII-BS case, the acute accent is applied over the e with a backspace: diacriticised characters have special rules. For the ASCII-BS charset, reversibility is still possible, but there might be difficult cases.

8.4 ASCII without diacritics nor underline

This charset is available in recode under the name flat.

This code is ASCII expunged of all diacritics and underlines, as long as they are applied using three character sequences, with BS in the middle. Also, despite slightly unrelated, each control character is represented by a sequence of two or three graphic characters. The newline character, however, keeps its functionality and is not represented.

Note that charset flat is a terminal charset. We can convert to flat, but not from it.

• ASCII		Usual ASCII
• ISO 8859		ASCII extended by Latin Alphabets
• ASCII-BS		ASCII 7-bits, `BS` to overstrike
• flat		ASCII without diacritics nor underline

8 ASCII and some derivatives

8.1 Usual ASCII

8.2 ASCII extended by Latin Alphabets

8.3 ASCII 7-bits, BS to overstrike

8.4 ASCII without diacritics nor underline

8.3 ASCII 7-bits, `BS` to overstrike