Next: IBM and MS, Previous: Tabular, Up: Top [Contents][Index]
• ASCII | Usual ASCII | |
• ISO 8859 | ASCII extended by Latin Alphabets | |
• ASCII-BS | ASCII 7-bits, BS to overstrike | |
• flat | ASCII without diacritics nor underline |
Next: ISO 8859, Previous: ASCII misc, Up: ASCII misc [Contents][Index]
This charset is available in recode
under the name ASCII
.
In fact, it’s true name is ANSI_X3.4-1968
as per RFC 1345,
accepted aliases being ANSI_X3.4-1986
, ASCII
,
IBM367
, ISO646-US
, ISO_646.irv:1991
,
US-ASCII
, cp367
, iso-ir-6
and us
. The
shortest way of specifying it in recode
is us
.
This documentation used to include ASCII tables. They have been removed
since the recode
program can now recreate these easily:
recode -lf us for commented ASCII recode -ld us for concise decimal table recode -lo us for concise octal table recode -lh us for concise hexadecimal table
Next: ASCII-BS, Previous: ASCII, Up: ASCII misc [Contents][Index]
There are many Latin charsets. The following has been written by Tim Lasko lasko@video.dec.com, a long while ago:
ISO Latin-1, or more completely ISO Latin Alphabet No 1, is now an international standard as of February 1987 (IS 8859, Part 1). For those American USEnet’rs that care, the 8-bit ASCII standard, which is essentially the same code, is going through the final administrative processes prior to publication. ISO Latin-1 (IS 8859/1) is actually one of an entire family of eight-bit one-byte character sets, all having ASCII on the left hand side, and with varying repertoires on the right hand side:
- Latin Alphabet No 1 (caters to Western Europe - now approved).
- Latin Alphabet No 2 (caters to Eastern Europe - now approved).
- Latin Alphabet No 3 (caters to SE Europe + others - in draft ballot).
- Latin Alphabet No 4 (caters to Northern Europe - in draft ballot).
- Latin-Cyrillic alphabet (right half all Cyrillic - processing currently suspended pending USSR input).
- Latin-Arabic alphabet (right half all Arabic - now approved).
- Latin-Greek alphabet (right half Greek + symbols - in draft ballot).
- Latin-Hebrew alphabet (right half Hebrew + symbols - proposed).
The ISO Latin Alphabet 1 is available as a charset in recode
under
the name Latin-1
. In fact, it’s true name is ISO_8859-1:1987
as per RFC 1345, accepted aliases being CP819
, IBM819
,
ISO-8859-1
, ISO_8859-1
, iso-ir-100
, l1
and Latin-1
. The shortest way of specifying it in recode
is l1
.
It is an eight-bit code which coincides with ASCII for the lower half.
This documentation used to include Latin-1 tables. They have been removed
since the recode
program can now recreate these easily:
recode -lf l1 for commented ISO Latin-1 recode -ld l1 for concise decimal table recode -lo l1 for concise octal table recode -lh l1 for concise hexadecimal table
Next: flat, Previous: ISO 8859, Up: ASCII misc [Contents][Index]
This charset is available in recode
under the name
ASCII-BS
, with BS
as an acceptable alias.
The file is straight ASCII, seven bits only. According to the definition of ASCII, diacritics are applied by a sequence of three characters: the letter, one BS, the diacritic mark. We deviate slightly from this by exchanging the diacritic mark and the letter so, on a screen device, the diacritic will disappear and let the letter alone. At recognition time, both methods are acceptable.
The French quotes are coded by the sequences: < BS " or " BS < for the opening quote and > BS " or " BS >
for the closing quote. This artificial convention was inherited in
straight ASCII-BS
from habits around Bang-Bang
entry, and
is not well known. But we decided to stick to it so that ASCII-BS
charset will not lose French quotes.
The ASCII-BS
charset is independent of ASCII
, and
different. The following examples demonstrate this, knowing at advance
that ‘!2’ is the Bang-Bang
way of representing an e
with an acute accent. Compare:
% echo \!2 | recode -v bang..l1/d Request: Bang-Bang..ISO-8859-1/Decimal-1 233, 10
with:
% echo \!2 | recode -v bang..bs/d Request: Bang-Bang..ISO-8859-1..ASCII-BS/Decimal-1 39, 8, 101, 10
In the first case, the e with an acute accent is merely
transmitted by the Latin-1..ASCII
mapping, not having a special
recoding rule for it. In the Latin-1..ASCII-BS
case, the acute
accent is applied over the e with a backspace: diacriticised
characters have special rules. For the ASCII-BS
charset,
reversibility is still possible, but there might be difficult cases.
Previous: ASCII-BS, Up: ASCII misc [Contents][Index]
This charset is available in recode
under the name flat
.
This code is ASCII expunged of all diacritics and underlines, as long as they are applied using three character sequences, with BS in the middle. Also, despite slightly unrelated, each control character is represented by a sequence of two or three graphic characters. The newline character, however, keeps its functionality and is not represented.
Note that charset flat
is a terminal charset. We can convert
to flat
, but not from it.
Previous: ASCII-BS, Up: ASCII misc [Contents][Index]