iconv
libraryThe recode
library itself contains most code and tables from the
portable iconv
library, written by Bruno Haible. In fact, many
capabilities of the recode
library are duplicated because of this
merging, as the older recode
and iconv
libraries share many
charsets. We discuss, here, the issues related to this duplication, and
other peculiarities specific to the iconv
library. The plan is to
remove duplications and better merge specificities, as recode
evolves.
As implemented, if a recoding request can be satisfied by the recode
library both with and without its iconv
library part, it is likely
that the iconv
library will be used. To sort out if the iconv
is indeed used of not, just use the ‘-v’ or ‘--verbose’ option,
see Recoding.
The :libiconv:
charset represents a conceptual pivot charset
within the iconv
part of the recode
library (in fact,
this pivot exists, but is not directly reachable). This charset has a
mere :
(a colon) for an alias. It is not allowed to recode from
or to this charset directly. But when this charset is selected as an
intermediate, usually by automatic means, then the iconv
part
of the recode
library is called to handle the transformations.
By using an ‘--ignore=:libiconv:’ option on the recode
call
or equivalently, but more simply, ‘-x:’, recode
is instructed
to fully avoid this charset as an intermediate, with the consequence that
the iconv
part of the library is defeated. Consider these two calls:
recode l1..1250 < input > output recode -x: l1..1250 < input > output
Both should transform input from ISO-8859-1
to CP1250
on output. The first call uses the iconv
part of the library,
while the second call avoids it. Whatever the path used, the results should
normally be identical. However, there might be observable differences.
Most of them might result from reversibility issues, as the iconv
engine, which the recode
library directly uses for the time being,
does not address reversibility. Even if much less likely, some differences
might result from slight errors in the tables used, such differences should
then be reported as bugs.
Other irregularities might be seen in the area of error detection and
recovery. The recode
library usually tries to detect canonicity
errors in input, and production of ambiguous output, but the iconv
part of the library currently does not. Input is always validated, however.
The recode
library may not always react properly when its iconv
part has no translation for a given character.
Within a collection of names for a single charset, the recode
library distinguishes one of them as being the genuine charset name,
while the others are said to be aliases. When recode
lists all
charsets, for example with the ‘-l’ or ‘--list’ option, the list
integrates all iconv
library charsets. The selection of one of the
aliases as the genuine charset name is an artifact added by recode
,
it does not come from iconv
. Moreover, the recode
library
dynamically resolves some conflicts when it initialises itself at runtime.
This might explain some discrepancies in the table below, as for what is
the genuine charset name.
US-ASCII
ASCII
, ISO646-US
, ISO_646.IRV:1991
, ISO-IR-6
, ANSI_X3.4-1968
, CP367
, IBM367
, US
, csASCII
and ISO646.1991-IRV
are aliases for this charset.
UTF-8
UTF8
is an alias for this charset.
UCS-2
ISO-10646-UCS-2
and csUnicode
are aliases for this charset.
UCS-2BE
UNICODEBIG
, UNICODE-1-1
and csUnicode11
are aliases for this charset.
UCS-2LE
UNICODELITTLE
is an alias for this charset.
UCS-4
ISO-10646-UCS-4
and csUCS4
are aliases for this charset.
UCS-4BE
UCS-4LE
UTF-16
UTF-16BE
UTF-16LE
UTF-7
UNICODE-1-1-UTF-7
and csUnicode11UTF7
are aliases for this charset.
UCS-2-INTERNAL
UCS-2-SWAPPED
UCS-4-INTERNAL
UCS-4-SWAPPED
JAVA
ISO-8859-1
ISO_8859-1
, ISO_8859-1:1987
, ISO-IR-100
, CP819
, IBM819
, LATIN1
, L1
, csISOLatin1
, ISO8859-1
and ISO8859_1
are aliases for this charset.
ISO-8859-2
ISO_8859-2
, ISO_8859-2:1987
, ISO-IR-101
, LATIN2
, L2
, csISOLatin2
, ISO8859-2
and ISO8859_2
are aliases for this charset.
ISO-8859-3
ISO_8859-3
, ISO_8859-3:1988
, ISO-IR-109
, LATIN3
, L3
, csISOLatin3
, ISO8859-3
and ISO8859_3
are aliases for this charset.
ISO-8859-4
ISO_8859-4
, ISO_8859-4:1988
, ISO-IR-110
, LATIN4
, L4
, csISOLatin4
, ISO8859-4
and ISO8859_4
are aliases for this charset.
ISO-8859-5
ISO_8859-5
, ISO_8859-5:1988
, ISO-IR-144
, CYRILLIC
, csISOLatinCyrillic
, ISO8859-5
and ISO8859_5
are aliases for this charset.
ISO-8859-6
ISO_8859-6
, ISO_8859-6:1987
, ISO-IR-127
, ECMA-114
, ASMO-708
, ARABIC
, csISOLatinArabic
, ISO8859-6
and ISO8859_6
are aliases for this charset.
ISO-8859-7
ISO_8859-7
, ISO_8859-7:1987
, ISO-IR-126
, ECMA-118
, ELOT_928
, GREEK8
, GREEK
, csISOLatinGreek
, ISO8859-7
and ISO8859_7
are aliases for this charset.
ISO-8859-8
ISO_8859-8
, ISO_8859-8:1988
, ISO-IR-138
, HEBREW
, csISOLatinHebrew
, ISO8859-8
and ISO8859_8
are aliases for this charset.
ISO-8859-9
ISO_8859-9
, ISO_8859-9:1989
, ISO-IR-148
, LATIN5
, L5
, csISOLatin5
, ISO8859-9
and ISO8859_9
are aliases for this charset.
ISO-8859-10
ISO_8859-10
, ISO_8859-10:1992
, ISO-IR-157
, LATIN6
, L6
, csISOLatin6
and ISO8859-10
are aliases for this charset.
ISO-8859-13
ISO_8859-13
, ISO-IR-179
, LATIN7
and L7
are aliases for this charset.
ISO-8859-14
ISO_8859-14
, ISO_8859-14:1998
, ISO-IR-199
, LATIN8
and L8
are aliases for this charset.
ISO-8859-15
ISO_8859-15
, ISO_8859-15:1998
and ISO-IR-203
are aliases for this charset.
ISO-8859-16
ISO_8859-16
, ISO_8859-16:2000
and ISO-IR-226
are aliases for this charset.
KOI8-R
csKOI8R
is an alias for this charset.
KOI8-U
KOI8-RU
CP1250
WINDOWS-1250
and MS-EE
are aliases for this charset.
CP1251
WINDOWS-1251
and MS-CYRL
are aliases for this charset.
CP1252
WINDOWS-1252
and MS-ANSI
are aliases for this charset.
CP1253
WINDOWS-1253
and MS-GREEK
are aliases for this charset.
CP1254
WINDOWS-1254
and MS-TURK
are aliases for this charset.
CP1255
WINDOWS-1255
and MS-HEBR
are aliases for this charset.
CP1256
WINDOWS-1256
and MS-ARAB
are aliases for this charset.
CP1257
WINDOWS-1257
and WINBALTRIM
are aliases for this charset.
CP1258
WINDOWS-1258
is an alias for this charset.
CP850
IBM850
, 850
and csPC850Multilingual
are aliases for this charset.
CP866
IBM866
, 866
and csIBM866
are aliases for this charset.
MacRoman
Macintosh
, MAC
and csMacintosh
are aliases for this charset.
MacCentralEurope
MacIceland
MacCroatian
MacRomania
MacCyrillic
MacUkraine
MacGreek
MacTurkish
MacHebrew
MacArabic
MacThai
HP-ROMAN8
ROMAN8
, R8
and csHPRoman8
are aliases for this charset.
NEXTSTEP
ARMSCII-8
Georgian-Academy
Georgian-PS
MuleLao-1
CP1133
IBM-CP1133
is an alias for this charset.
TIS-620
TIS620
, TIS620-0
, TIS620.2529-1
, TIS620.2533-0
, TIS620.2533-1
and ISO-IR-166
are aliases for this charset.
CP874
WINDOWS-874
is an alias for this charset.
VISCII
VISCII1.1-1
and csVISCII
are aliases for this charset.
TCVN
TCVN-5712
, TCVN5712-1
and TCVN5712-1:1993
are aliases for this charset.
JIS_C6220-1969-RO
ISO646-JP
, ISO-IR-14
, JP
and csISO14JISC6220ro
are aliases for this charset.
JIS_X0201
JISX0201-1976
, X0201
, csHalfWidthKatakana
, JISX0201.1976-0
and JIS0201
are aliases for this charset.
JIS_X0208
JIS_X0208-1983
, JIS_X0208-1990
, JIS0208
, X0208
, ISO-IR-87
, csISO87JISX0208
, JISX0208.1983-0
, JISX0208.1990-0
and JIS0208
are aliases for this charset.
JIS_X0212
JIS_X0212.1990-0
, JIS_X0212-1990
, X0212
, ISO-IR-159
, csISO159JISX02121990
, JISX0212.1990-0
and JIS0212
are aliases for this charset.
GB_1988-80
ISO646-CN
, ISO-IR-57
, CN
and csISO57GB1988
are aliases for this charset.
GB_2312-80
ISO-IR-58
, csISO58GB231280
, CHINESE
and GB2312.1980-0
are aliases for this charset.
ISO-IR-165
CN-GB-ISOIR165
is an alias for this charset.
KSC_5601
KS_C_5601-1987
, KS_C_5601-1989
, ISO-IR-149
, csKSC56011987
, KOREAN
, KSC5601.1987-0
and KSX1001:1992
are aliases for this charset.
EUC-JP
EUCJP
, Extended_UNIX_Code_Packed_Format_for_Japanese
, csEUCPkdFmtJapanese
and EUC_JP
are aliases for this charset.
SJIS
SHIFT_JIS
, SHIFT-JIS
, MS_KANJI
and csShiftJIS
are aliases for this charset.
CP932
ISO-2022-JP
csISO2022JP
and ISO2022JP
are aliases for this charset.
ISO-2022-JP-1
ISO-2022-JP-2
csISO2022JP2
is an alias for this charset.
EUC-CN
EUCCN
, GB2312
, CN-GB
, csGB2312
and EUC_CN
are aliases for this charset.
GBK
CP936
is an alias for this charset.
GB18030
ISO-2022-CN
csISO2022CN
and ISO2022CN
are aliases for this charset.
ISO-2022-CN-EXT
HZ
HZ-GB-2312
is an alias for this charset.
EUC-TW
EUCTW
, csEUCTW
and EUC_TW
are aliases for this charset.
BIG5
BIG-5
, BIG-FIVE
, BIGFIVE
, CN-BIG5
and csBig5
are aliases for this charset.
CP950
BIG5HKSCS
EUC-KR
EUCKR
, csEUCKR
and EUC_KR
are aliases for this charset.
CP949
UHC
is an alias for this charset.
JOHAB
CP1361
is an alias for this charset.
ISO-2022-KR
csISO2022KR
and ISO2022KR
are aliases for this charset.
CHAR
WCHAR_T