|
Python comes with a number of codecs builtin, either implemented as C functions, or with
dictionaries as mapping tables. The following table lists the codecs by name, together with a
few common aliases, and the languages for which the encoding is likely used. Neither the list
of aliases nor the list of languages is meant to be exhaustive. Notice that spelling
alternatives that only differ in case or use a hyphen instead of an underscore are also valid
aliases.
Many of the character sets support the same languages. They vary in individual characters
(e.g. whether the EURO SIGN is supported or not), and in the assignment of characters to code
positions. For the European languages in particular, the following variants typically exist:
- an ISO 8859 codeset
- a Microsoft Windows code page, which is typically derived from a 8859 codeset, but
replaces control characters with additional graphic characters
- an IBM EBCDIC code page
- an IBM PC code page, which is ASCII compatible
|
|
| ascii |
646, us-ascii |
English |
|
| cp037 |
IBM037, IBM039 |
English |
|
| cp424 |
EBCDIC-CP-HE, IBM424 |
Hebrew |
|
| cp437 |
437, IBM437 |
English |
|
| cp500 |
EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 |
Western Europe |
|
| cp737 |
|
Greek |
|
| cp775 |
IBM775 |
Baltic languages |
|
| cp850 |
850, IBM850 |
Western Europe |
|
| cp852 |
852, IBM852 |
Central and Eastern Europe |
|
| cp855 |
855, IBM855 |
Bulgarian, Byelorussian, Macedonian, Russian,
Serbian |
|
| cp856 |
|
Hebrew |
|
| cp857 |
857, IBM857 |
Turkish |
|
| cp860 |
860, IBM860 |
Portuguese |
|
| cp861 |
861, CP-IS, IBM861 |
Icelandic |
|
| cp862 |
862, IBM862 |
Hebrew |
|
| cp863 |
863, IBM863 |
Canadian |
|
| cp864 |
IBM864 |
Arabic |
|
| cp865 |
865, IBM865 |
Danish, Norwegian |
|
| cp869 |
869, CP-GR, IBM869 |
Greek |
|
| cp874 |
|
Thai |
|
| cp875 |
|
Greek |
|
| cp1006 |
|
Urdu |
|
| cp1026 |
ibm1026 |
Turkish |
|
| cp1140 |
ibm1140 |
Western Europe |
|
| cp1250 |
windows-1250 |
Central and Eastern
Europe |
|
| cp1251 |
windows-1251 |
Bulgarian,
Byelorussian, Macedonian,
Russian, Serbian |
|
| cp1252 |
windows-1252 |
Western Europe |
|
| cp1253 |
windows-1253 |
Greek |
|
| cp1254 |
windows-1254 |
Turkish |
|
| cp1255 |
windows-1255 |
Hebrew |
|
| cp1256 |
windows1256 |
Arabic |
|
| cp1257 |
windows-1257 |
Baltic
languages |
|
| cp1258 |
windows-1258 |
Vietnamese |
|
| latin_1 |
iso-8859-1,
iso8859-1, 8859,
cp819, latin,
latin1, L1 |
West
Europe |
|
| iso8859_2 |
iso-8859-2,
latin2, L2 |
Central
and Eastern
Europe |
|
| iso8859_3 |
iso-8859-3,
latin3, L3 |
Esperanto,
Maltese |
|
| iso8859_4 |
iso-8859-4,
latin4, L4 |
Baltic
languagues |
|
| iso8859_5 |
iso-8859-5,
cyrillic |
Bulgarian,
Byelorussian,
Macedonian,
Russian,
Serbian |
|
| iso8859_6 |
iso-8859-6,
arabic |
Arabic |
|
| iso8859_7 |
iso-8859-7,
greek,
greek8 |
Greek |
|
| iso8859_8 |
iso-8859-8,
hebrew |
Hebrew |
|
| iso8859_9 |
iso-8859-9,
latin5,
L5 |
Turkish |
|
| iso8859_10 |
iso-8859-10,
latin6,
L6 |
Nordic
languages |
|
| iso8859_13 |
iso-8859-13 |
Baltic
languages |
|
| iso8859_14 |
iso-8859-14,
latin8,
L8 |
Celtic
languages |
|
| iso8859_15 |
iso-8859-15 |
Western
Europe |
|
| koi8_r |
|
Russian |
|
| koi8_u |
|
Ukrainian |
|
| mac_cyrillic |
maccyrillic |
Bulgarian,
Byelorussian,
Macedonian,
Russian,
Serbian |
|
| mac_greek |
macgreek |
Greek |
|
| mac_iceland |
maciceland |
Icelandic |
|
| mac_latin2 |
maclatin2,
maccentraleurope |
Central
and
Eastern
Europe |
|
| mac_roman |
macroman |
Western
Europe |
|
| mac_turkish |
macturkish |
Turkish |
|
| utf_16 |
U16,
utf16 |
all
languages |
|
| utf_16_be |
UTF-16BE |
all
languages
(BMP
only) |
|
| utf_16_le |
UTF-16LE |
all
languages
(BMP
only) |
|
| utf_7 |
U7 |
all
languages |
|
| utf_8 |
U8,
UTF,
utf8 |
all
languages |
|
A
number
of
codecs
are
specific
to
Python,
so
their
codec
names
have
no
meaning
outside
Python.
Some
of
them
don't
convert
from
Unicode
strings
to
byte
strings,
but
instead
use
the
property
of
the
Python
codecs
machinery
that
any
bijective
function
with
one
argument
can
be
considered
as
an
encoding.
For
the
codecs
listed
below,
the
result
in
the
``encoding''
direction
is
always
a
byte
string.
The
result
of
the
``decoding''
direction
is
listed
as
operand
type
in
the
table.
|
|
| base64_codec |
base64,
base-64 |
byte
string |
Convert
operand
to
MIME
base64 |
|
| hex_codec |
hex |
byte
string |
Convert
operand
to
hexadecimal
representation,
with
two
digits
per
byte |
|
| idna |
|
Unicode
string |
Implements
RFC
3490.
New
in
version
2.3.
See
also
encodings.idna |
|
| mbcs |
dbcs |
Unicode
string |
Windows
only:
Encode
operand
according
to
the
ANSI
codepage
(CP_ACP) |
|
| palmos |
|
Unicode
string |
Encoding
of
PalmOS
3.5 |
|
| punycode |
|
Unicode
string |
Implements
RFC
3492.
New
in
version
2.3. |
|
| quopri_codec |
quopri,
quoted-printable,
quotedprintable |
byte
string |
Convert
operand
to
MIME
quoted
printable |
|
| raw_unicode_escape |
|
Unicode
string |
Produce
a
string
that
is
suitable
as
raw
Unicode
literal
in
Python
source
code |
|
| rot_13 |
rot13 |
byte
string |
Returns
the
Caesar-cypher
encryption
of
the
operand |
|
| string_escape |
|
byte
string |
Produce
a
string
that
is
suitable
as
string
literal
in
Python
source
code |
|
| undefined |
|
any |
Raise
an
exception
for
all
conversion.
Can
be
used
as
the
system
encoding
if
no
automatic
coercion
between
byte
and
Unicode
strings
is
desired. |
|
| unicode_escape |
|
Unicode
string |
Produce
a
string
that
is
suitable
as
Unicode
literal
in
Python
source
code |
|
| unicode_internal |
|
Unicode
string |
Return
the
internal
represenation
of
the
operand |
|
| uu_codec |
uu |
byte
string |
Convert
the
operand
using
uuencode |
|
| zlib_codec |
zip,
zlib |
byte
string |
Compress
the
operand
using
gzip |
|