iconv libraryThe recode library itself contains most code and tables from the
portable iconv library, written by Bruno Haible. In fact, many
capabilities of the recode library are duplicated because of this
merging, as the older recode and iconv libraries share many
charsets. We discuss, here, the issues related to this duplication, and
other peculiarities specific to the iconv library. The plan is to
remove duplications and better merge specificities, as recode evolves.
As implemented, if a recoding request can be satisfied by the recode
library both with and without its iconv library part, it is likely
that the iconv library will be used. To sort out if the iconv
is indeed used of not, just use the ‘-v’ or ‘--verbose’ option,
see Recoding.
The :libiconv: charset represents a conceptual pivot charset
within the iconv part of the recode library (in fact,
this pivot exists, but is not directly reachable). This charset has a
mere : (a colon) for an alias. It is not allowed to recode from
or to this charset directly. But when this charset is selected as an
intermediate, usually by automatic means, then the iconv part
of the recode library is called to handle the transformations.
By using an ‘--ignore=:libiconv:’ option on the recode call
or equivalently, but more simply, ‘-x:’, recode is instructed
to fully avoid this charset as an intermediate, with the consequence that
the iconv part of the library is defeated. Consider these two calls:
recode l1..1250 < input > output recode -x: l1..1250 < input > output
Both should transform input from ISO-8859-1 to CP1250
on output. The first call uses the iconv part of the library,
while the second call avoids it. Whatever the path used, the results should
normally be identical. However, there might be observable differences.
Most of them might result from reversibility issues, as the iconv
engine, which the recode library directly uses for the time being,
does not address reversibility. Even if much less likely, some differences
might result from slight errors in the tables used, such differences should
then be reported as bugs.
Other irregularities might be seen in the area of error detection and
recovery. The recode library usually tries to detect canonicity
errors in input, and production of ambiguous output, but the iconv
part of the library currently does not. Input is always validated, however.
The recode library may not always react properly when its iconv
part has no translation for a given character.
Within a collection of names for a single charset, the recode
library distinguishes one of them as being the genuine charset name,
while the others are said to be aliases. When recode lists all
charsets, for example with the ‘-l’ or ‘--list’ option, the list
integrates all iconv library charsets. The selection of one of the
aliases as the genuine charset name is an artifact added by recode,
it does not come from iconv. Moreover, the recode library
dynamically resolves some conflicts when it initialises itself at runtime.
This might explain some discrepancies in the table below, as for what is
the genuine charset name.
US-ASCIIASCII, ISO646-US, ISO_646.IRV:1991, ISO-IR-6, ANSI_X3.4-1968, CP367, IBM367, US, csASCII and ISO646.1991-IRV are aliases for this charset.
UTF-8UTF8 is an alias for this charset.
UCS-2ISO-10646-UCS-2 and csUnicode are aliases for this charset.
UCS-2BEUNICODEBIG, UNICODE-1-1 and csUnicode11 are aliases for this charset.
UCS-2LEUNICODELITTLE is an alias for this charset.
UCS-4ISO-10646-UCS-4 and csUCS4 are aliases for this charset.
UCS-4BEUCS-4LEUTF-16UTF-16BEUTF-16LEUTF-7UNICODE-1-1-UTF-7 and csUnicode11UTF7 are aliases for this charset.
UCS-2-INTERNALUCS-2-SWAPPEDUCS-4-INTERNALUCS-4-SWAPPEDJAVAISO-8859-1ISO_8859-1, ISO_8859-1:1987, ISO-IR-100, CP819, IBM819, LATIN1, L1, csISOLatin1, ISO8859-1 and ISO8859_1 are aliases for this charset.
ISO-8859-2ISO_8859-2, ISO_8859-2:1987, ISO-IR-101, LATIN2, L2, csISOLatin2, ISO8859-2 and ISO8859_2 are aliases for this charset.
ISO-8859-3ISO_8859-3, ISO_8859-3:1988, ISO-IR-109, LATIN3, L3, csISOLatin3, ISO8859-3 and ISO8859_3 are aliases for this charset.
ISO-8859-4ISO_8859-4, ISO_8859-4:1988, ISO-IR-110, LATIN4, L4, csISOLatin4, ISO8859-4 and ISO8859_4 are aliases for this charset.
ISO-8859-5ISO_8859-5, ISO_8859-5:1988, ISO-IR-144, CYRILLIC, csISOLatinCyrillic, ISO8859-5 and ISO8859_5 are aliases for this charset.
ISO-8859-6ISO_8859-6, ISO_8859-6:1987, ISO-IR-127, ECMA-114, ASMO-708, ARABIC, csISOLatinArabic, ISO8859-6 and ISO8859_6 are aliases for this charset.
ISO-8859-7ISO_8859-7, ISO_8859-7:1987, ISO-IR-126, ECMA-118, ELOT_928, GREEK8, GREEK, csISOLatinGreek, ISO8859-7 and ISO8859_7 are aliases for this charset.
ISO-8859-8ISO_8859-8, ISO_8859-8:1988, ISO-IR-138, HEBREW, csISOLatinHebrew, ISO8859-8 and ISO8859_8 are aliases for this charset.
ISO-8859-9ISO_8859-9, ISO_8859-9:1989, ISO-IR-148, LATIN5, L5, csISOLatin5, ISO8859-9 and ISO8859_9 are aliases for this charset.
ISO-8859-10ISO_8859-10, ISO_8859-10:1992, ISO-IR-157, LATIN6, L6, csISOLatin6 and ISO8859-10 are aliases for this charset.
ISO-8859-13ISO_8859-13, ISO-IR-179, LATIN7 and L7 are aliases for this charset.
ISO-8859-14ISO_8859-14, ISO_8859-14:1998, ISO-IR-199, LATIN8 and L8 are aliases for this charset.
ISO-8859-15ISO_8859-15, ISO_8859-15:1998 and ISO-IR-203 are aliases for this charset.
ISO-8859-16ISO_8859-16, ISO_8859-16:2000 and ISO-IR-226 are aliases for this charset.
KOI8-RcsKOI8R is an alias for this charset.
KOI8-UKOI8-RUCP1250WINDOWS-1250 and MS-EE are aliases for this charset.
CP1251WINDOWS-1251 and MS-CYRL are aliases for this charset.
CP1252WINDOWS-1252 and MS-ANSI are aliases for this charset.
CP1253WINDOWS-1253 and MS-GREEK are aliases for this charset.
CP1254WINDOWS-1254 and MS-TURK are aliases for this charset.
CP1255WINDOWS-1255 and MS-HEBR are aliases for this charset.
CP1256WINDOWS-1256 and MS-ARAB are aliases for this charset.
CP1257WINDOWS-1257 and WINBALTRIM are aliases for this charset.
CP1258WINDOWS-1258 is an alias for this charset.
CP850IBM850, 850 and csPC850Multilingual are aliases for this charset.
CP866IBM866, 866 and csIBM866 are aliases for this charset.
MacRomanMacintosh, MAC and csMacintosh are aliases for this charset.
MacCentralEuropeMacIcelandMacCroatianMacRomaniaMacCyrillicMacUkraineMacGreekMacTurkishMacHebrewMacArabicMacThaiHP-ROMAN8ROMAN8, R8 and csHPRoman8 are aliases for this charset.
NEXTSTEPARMSCII-8Georgian-AcademyGeorgian-PSMuleLao-1CP1133IBM-CP1133 is an alias for this charset.
TIS-620TIS620, TIS620-0, TIS620.2529-1, TIS620.2533-0, TIS620.2533-1 and ISO-IR-166 are aliases for this charset.
CP874WINDOWS-874 is an alias for this charset.
VISCIIVISCII1.1-1 and csVISCII are aliases for this charset.
TCVNTCVN-5712, TCVN5712-1 and TCVN5712-1:1993 are aliases for this charset.
JIS_C6220-1969-ROISO646-JP, ISO-IR-14, JP and csISO14JISC6220ro are aliases for this charset.
JIS_X0201JISX0201-1976, X0201, csHalfWidthKatakana, JISX0201.1976-0 and JIS0201 are aliases for this charset.
JIS_X0208JIS_X0208-1983, JIS_X0208-1990, JIS0208, X0208, ISO-IR-87, csISO87JISX0208, JISX0208.1983-0, JISX0208.1990-0 and JIS0208 are aliases for this charset.
JIS_X0212JIS_X0212.1990-0, JIS_X0212-1990, X0212, ISO-IR-159, csISO159JISX02121990, JISX0212.1990-0 and JIS0212 are aliases for this charset.
GB_1988-80ISO646-CN, ISO-IR-57, CN and csISO57GB1988 are aliases for this charset.
GB_2312-80ISO-IR-58, csISO58GB231280, CHINESE and GB2312.1980-0 are aliases for this charset.
ISO-IR-165CN-GB-ISOIR165 is an alias for this charset.
KSC_5601KS_C_5601-1987, KS_C_5601-1989, ISO-IR-149, csKSC56011987, KOREAN, KSC5601.1987-0 and KSX1001:1992 are aliases for this charset.
EUC-JPEUCJP, Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese and EUC_JP are aliases for this charset.
SJISSHIFT_JIS, SHIFT-JIS, MS_KANJI and csShiftJIS are aliases for this charset.
CP932ISO-2022-JPcsISO2022JP and ISO2022JP are aliases for this charset.
ISO-2022-JP-1ISO-2022-JP-2csISO2022JP2 is an alias for this charset.
EUC-CNEUCCN, GB2312, CN-GB, csGB2312 and EUC_CN are aliases for this charset.
GBKCP936 is an alias for this charset.
GB18030ISO-2022-CNcsISO2022CN and ISO2022CN are aliases for this charset.
ISO-2022-CN-EXTHZHZ-GB-2312 is an alias for this charset.
EUC-TWEUCTW, csEUCTW and EUC_TW are aliases for this charset.
BIG5BIG-5, BIG-FIVE, BIGFIVE, CN-BIG5 and csBig5 are aliases for this charset.
CP950BIG5HKSCSEUC-KREUCKR, csEUCKR and EUC_KR are aliases for this charset.
CP949UHC is an alias for this charset.
JOHABCP1361 is an alias for this charset.
ISO-2022-KRcsISO2022KR and ISO2022KR are aliases for this charset.
CHARWCHAR_T