Website hosting service by Active-Venture.com
  

 Back to Index

4.9.2 Standard Encodings

Python comes with a number of codecs builtin, either implemented as C functions, or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases.

Many of the character sets support the same languages. They vary in individual characters (e.g. whether the EURO SIGN is supported or not), and in the assignment of characters to code positions. For the European languages in particular, the following variants typically exist:

 

  • an ISO 8859 codeset
  • a Microsoft Windows code page, which is typically derived from a 8859 codeset, but replaces control characters with additional graphic characters
  • an IBM EBCDIC code page
  • an IBM PC code page, which is ASCII compatible

 
Codec  Aliases  Languages 

 

ascii 646, us-ascii English

 

cp037 IBM037, IBM039 English

 

cp424 EBCDIC-CP-HE, IBM424 Hebrew

 

cp437 437, IBM437 English

 

cp500 EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 Western Europe

 

cp737   Greek

 

cp775 IBM775 Baltic languages

 

cp850 850, IBM850 Western Europe

 

cp852 852, IBM852 Central and Eastern Europe

 

cp855 855, IBM855 Bulgarian, Byelorussian, Macedonian, Russian, Serbian

 

cp856   Hebrew

 

cp857 857, IBM857 Turkish

 

cp860 860, IBM860 Portuguese

 

cp861 861, CP-IS, IBM861 Icelandic

 

cp862 862, IBM862 Hebrew

 

cp863 863, IBM863 Canadian

 

cp864 IBM864 Arabic

 

cp865 865, IBM865 Danish, Norwegian

 

cp869 869, CP-GR, IBM869 Greek

 

cp874   Thai

 

cp875   Greek

 

cp1006   Urdu

 

cp1026 ibm1026 Turkish

 

cp1140 ibm1140 Western Europe

 

cp1250 windows-1250 Central and Eastern Europe

 

cp1251 windows-1251 Bulgarian, Byelorussian, Macedonian, Russian, Serbian

 

cp1252 windows-1252 Western Europe

 

cp1253 windows-1253 Greek

 

cp1254 windows-1254 Turkish

 

cp1255 windows-1255 Hebrew

 

cp1256 windows1256 Arabic

 

cp1257 windows-1257 Baltic languages

 

cp1258 windows-1258 Vietnamese

 

latin_1 iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1 West Europe

 

iso8859_2 iso-8859-2, latin2, L2 Central and Eastern Europe

 

iso8859_3 iso-8859-3, latin3, L3 Esperanto, Maltese

 

iso8859_4 iso-8859-4, latin4, L4 Baltic languagues

 

iso8859_5 iso-8859-5, cyrillic Bulgarian, Byelorussian, Macedonian, Russian, Serbian

 

iso8859_6 iso-8859-6, arabic Arabic

 

iso8859_7 iso-8859-7, greek, greek8 Greek

 

iso8859_8 iso-8859-8, hebrew Hebrew

 

iso8859_9 iso-8859-9, latin5, L5 Turkish

 

iso8859_10 iso-8859-10, latin6, L6 Nordic languages

 

iso8859_13 iso-8859-13 Baltic languages

 

iso8859_14 iso-8859-14, latin8, L8 Celtic languages

 

iso8859_15 iso-8859-15 Western Europe

 

koi8_r   Russian

 

koi8_u   Ukrainian

 

mac_cyrillic maccyrillic Bulgarian, Byelorussian, Macedonian, Russian, Serbian

 

mac_greek macgreek Greek

 

mac_iceland maciceland Icelandic

 

mac_latin2 maclatin2, maccentraleurope Central and Eastern Europe

 

mac_roman macroman Western Europe

 

mac_turkish macturkish Turkish

 

utf_16 U16, utf16 all languages

 

utf_16_be UTF-16BE all languages (BMP only)

 

utf_16_le UTF-16LE all languages (BMP only)

 

utf_7 U7 all languages

 

utf_8 U8, UTF, utf8 all languages

 

A number of codecs are specific to Python, so their codec names have no meaning outside Python. Some of them don't convert from Unicode strings to byte strings, but instead use the property of the Python codecs machinery that any bijective function with one argument can be considered as an encoding.

For the codecs listed below, the result in the ``encoding'' direction is always a byte string. The result of the ``decoding'' direction is listed as operand type in the table.

 
Codec  Aliases  Operand type  Purpose 

 

base64_codec base64, base-64 byte string Convert operand to MIME base64

 

hex_codec hex byte string Convert operand to hexadecimal representation, with two digits per byte

 

idna   Unicode string Implements RFC 3490. New in version 2.3. See also encodings.idna

 

mbcs dbcs Unicode string Windows only: Encode operand according to the ANSI codepage (CP_ACP)

 

palmos   Unicode string Encoding of PalmOS 3.5

 

punycode   Unicode string Implements RFC 3492. New in version 2.3.

 

quopri_codec quopri, quoted-printable, quotedprintable byte string Convert operand to MIME quoted printable

 

raw_unicode_escape   Unicode string Produce a string that is suitable as raw Unicode literal in Python source code

 

rot_13 rot13 byte string Returns the Caesar-cypher encryption of the operand

 

string_escape   byte string Produce a string that is suitable as string literal in Python source code

 

undefined   any Raise an exception for all conversion. Can be used as the system encoding if no automatic coercion between byte and Unicode strings is desired.

 

unicode_escape   Unicode string Produce a string that is suitable as Unicode literal in Python source code

 

unicode_internal   Unicode string Return the internal represenation of the operand

 

uu_codec uu byte string Convert the operand using uuencode

 

zlib_codec zip, zlib byte string Compress the operand using gzip
 

  

 

2002-2004 Active-Venture.com Webhosting Service

 

Disclaimer: This documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://python.org/doc/

 

Domain registration : Buy domain name or register domain name from $5.95/year only

 

Cheap domain registration : Register domain name or buy domain name, including free domain hosting services

 
  Active-Domain.com offers cheap domain registration, domain name transfer and domain search services