Next: CDC, Previous: ASCII misc, Up: Top [Contents][Index]
The recode
program provides various IBM or Microsoft code pages
(see Tabular). An easy way to find them all at once out of the
recode
program itself is through the command:
recode -l | egrep -i '(CP|IBM)[0-9]'
But also, see few special charsets presented in the incoming sections.
• EBCDIC | EBCDIC codes | |
• IBM-PC | IBM’s PC code | |
• Icon-QNX | Unisys’ Icon code |
Next: IBM-PC, Previous: IBM and MS, Up: IBM and MS [Contents][Index]
This charset is the IBM’s External Binary Coded Decimal for Interchange
Coding. This is an eight bits code. The following three variants were
implemented in recode
independently of RFC 1345:
EBCDIC
In recode
, the us..ebcdic
conversion is identical to ‘dd
conv=ebcdic’ conversion, and recode
ebcdic..us
conversion is
identical to ‘dd conv=ascii’ conversion. This charset also represents
the way Control Data Corporation relates EBCDIC to 8-bits ASCII.
EBCDIC-CCC
In recode
, the us..ebcdic-ccc
or ebcdic-ccc..us
conversions represent the way Concurrent Computer Corporation (formerly
Perkin Elmer) relates EBCDIC to 8-bits ASCII.
EBCDIC-IBM
In recode
, the us..ebcdic-ibm
conversion is almost
identical to the GNU ‘dd conv=ibm’ conversion. Given the exact
‘dd conv=ibm’ conversion table, recode
once said:
Codes 91 and 213 both recode to 173 Codes 93 and 229 both recode to 189 No character recodes to 74 No character recodes to 106
So I arbitrarily chose to recode 213 by 74 and 229 by 106. This makes the
EBCDIC-IBM
recoding reversible, but this is not necessarily the best
correction. In any case, I think that GNU dd
should be amended.
dd
and recode
should ideally agree on the same correction.
So, this table might change once again.
RFC 1345 brings into recode
15 other EBCDIC charsets, and 21 other
charsets having EBCDIC in at least one of their alias names. You can
get a list of all these by executing:
recode -l | grep -i ebcdic
Note that recode
may convert a pure stream of EBCDIC characters,
but it does not know how to handle binary data between records which
is sometimes used to delimit them and build physical blocks. If end of
lines are not marked, fixed record size may produce something readable,
but VB
or VBS
blocking is likely to yield some garbage in
the converted results.
Next: Icon-QNX, Previous: EBCDIC, Up: IBM and MS [Contents][Index]
This charset is available in recode
under the name IBM-PC
,
with dos
, MSDOS
and pc
as acceptable aliases.
The shortest way of specifying it in recode
is pc
.
The charset is aimed towards a PC microcomputer from IBM or any compatible.
This is an eight-bit code. This charset is fairly old in recode
,
its tables were produced a long while ago by mere inspection of a printed
chart of the IBM-PC codes and glyph.
It has CR-LF
as its implied surface. This means that, if the original
end of lines have to be preserved while going out of IBM-PC
, they
should currently be added back through the usage of a surface on the other
charset, or better, just never removed. Here are examples for both cases:
recode pc..l2/cl < input > output recode pc/..l2 < input > output
RFC 1345 brings into recode
44 ‘IBM’ charsets or code pages,
and also 8 other code pages. You can get a list of these all these by
executing:11
recode -l | egrep -i '(CP|IBM)[0-9]'
All charset or aliases beginning with letters ‘CP’ or ‘IBM’
also have CR-LF
as their implied surface. The same is true for a
purely numeric alias in the same family. For example, all of 819
,
CP819
and IBM819
imply CR-LF
as a surface. Note that
ISO-8859-1
does not imply a surface, despite it shares the
same tabular data as 819
.
There are a few discrepancies between this IBM-PC
charset and the
very similar RFC 1345 charset ibm437
, which have not been analysed
yet, so the charsets are being kept separate for now. This might change in
the future, and the IBM-PC
charset might disappear. Wizards would
be interested in comparing the output of these two commands:
recode -vh IBM-PC..Latin-1 recode -vh IBM437..Latin-1
The first command uses the charset prior to RFC 1345 introduction. Both methods give different recodings. These differences are annoying, the fuzziness will have to be explained and settle down one day.
Previous: IBM-PC, Up: IBM and MS [Contents][Index]
This charset is available in recode
under the name
Icon-QNX
, with QNX
as an acceptable alias.
The file is using Unisys’ Icon way to represent diacritics with code 25 escape sequences, under the system QNX. This is a seven-bit code, even if eight-bit codes can flow through as part of IBM-PC charset.
On DOS/Windows, stock shells do not know that apostrophes quote special characters like |, so one need to use double quotes instead of apostrophes.
Previous: IBM-PC, Up: IBM and MS [Contents][Index]