Next: CDC, Previous: ASCII misc, Up: Top [Contents][Index]
The recode program provides various IBM or Microsoft code pages
(see Tabular). An easy way to find them all at once out of the
recode program itself is through the command:
recode -l | egrep -i '(CP|IBM)[0-9]'
But also, see few special charsets presented in the incoming sections.
| • EBCDIC | EBCDIC codes | |
| • IBM-PC | IBM’s PC code | |
| • Icon-QNX | Unisys’ Icon code |
Next: IBM-PC, Previous: IBM and MS, Up: IBM and MS [Contents][Index]
This charset is the IBM’s External Binary Coded Decimal for Interchange
Coding. This is an eight bits code. The following three variants were
implemented in recode independently of RFC 1345:
EBCDICIn recode, the us..ebcdic conversion is identical to ‘dd
conv=ebcdic’ conversion, and recode ebcdic..us conversion is
identical to ‘dd conv=ascii’ conversion. This charset also represents
the way Control Data Corporation relates EBCDIC to 8-bits ASCII.
EBCDIC-CCCIn recode, the us..ebcdic-ccc or ebcdic-ccc..us
conversions represent the way Concurrent Computer Corporation (formerly
Perkin Elmer) relates EBCDIC to 8-bits ASCII.
EBCDIC-IBMIn recode, the us..ebcdic-ibm conversion is almost
identical to the GNU ‘dd conv=ibm’ conversion. Given the exact
‘dd conv=ibm’ conversion table, recode once said:
Codes 91 and 213 both recode to 173 Codes 93 and 229 both recode to 189 No character recodes to 74 No character recodes to 106
So I arbitrarily chose to recode 213 by 74 and 229 by 106. This makes the
EBCDIC-IBM recoding reversible, but this is not necessarily the best
correction. In any case, I think that GNU dd should be amended.
dd and recode should ideally agree on the same correction.
So, this table might change once again.
RFC 1345 brings into recode 15 other EBCDIC charsets, and 21 other
charsets having EBCDIC in at least one of their alias names. You can
get a list of all these by executing:
recode -l | grep -i ebcdic
Note that recode may convert a pure stream of EBCDIC characters,
but it does not know how to handle binary data between records which
is sometimes used to delimit them and build physical blocks. If end of
lines are not marked, fixed record size may produce something readable,
but VB or VBS blocking is likely to yield some garbage in
the converted results.
Next: Icon-QNX, Previous: EBCDIC, Up: IBM and MS [Contents][Index]
This charset is available in recode under the name IBM-PC,
with dos, MSDOS and pc as acceptable aliases.
The shortest way of specifying it in recode is pc.
The charset is aimed towards a PC microcomputer from IBM or any compatible.
This is an eight-bit code. This charset is fairly old in recode,
its tables were produced a long while ago by mere inspection of a printed
chart of the IBM-PC codes and glyph.
It has CR-LF as its implied surface. This means that, if the original
end of lines have to be preserved while going out of IBM-PC, they
should currently be added back through the usage of a surface on the other
charset, or better, just never removed. Here are examples for both cases:
recode pc..l2/cl < input > output recode pc/..l2 < input > output
RFC 1345 brings into recode 44 ‘IBM’ charsets or code pages,
and also 8 other code pages. You can get a list of these all these by
executing:11
recode -l | egrep -i '(CP|IBM)[0-9]'
All charset or aliases beginning with letters ‘CP’ or ‘IBM’
also have CR-LF as their implied surface. The same is true for a
purely numeric alias in the same family. For example, all of 819,
CP819 and IBM819 imply CR-LF as a surface. Note that
ISO-8859-1 does not imply a surface, despite it shares the
same tabular data as 819.
There are a few discrepancies between this IBM-PC charset and the
very similar RFC 1345 charset ibm437, which have not been analysed
yet, so the charsets are being kept separate for now. This might change in
the future, and the IBM-PC charset might disappear. Wizards would
be interested in comparing the output of these two commands:
recode -vh IBM-PC..Latin-1 recode -vh IBM437..Latin-1
The first command uses the charset prior to RFC 1345 introduction. Both methods give different recodings. These differences are annoying, the fuzziness will have to be explained and settle down one day.
Previous: IBM-PC, Up: IBM and MS [Contents][Index]
This charset is available in recode under the name
Icon-QNX, with QNX as an acceptable alias.
The file is using Unisys’ Icon way to represent diacritics with code 25 escape sequences, under the system QNX. This is a seven-bit code, even if eight-bit codes can flow through as part of IBM-PC charset.
On DOS/Windows, stock shells do not know that apostrophes quote special characters like |, so one need to use double quotes instead of apostrophes.
Previous: IBM-PC, Up: IBM and MS [Contents][Index]