|
The codecs defines a set of base classes which define the interface
and can also be used to easily write you own codecs for use in Python.
Each codec has to define four interfaces to make it usable as codec in Python: stateless
encoder, stateless decoder, stream reader and stream writer. The stream reader and writers
typically reuse the stateless encoder/decoder to implement the file protocols.
The Codec class defines the interface for stateless
encoders/decoders.
To simplify and standardize error handling, the encode() and decode() methods may implement different error handling schemes by
providing the errors string argument. The following string values are defined and
implemented by all standard Python codecs:
'strict' |
Raise UnicodeError (or a subclass); this is
the default. |
'ignore' |
Ignore the character and continue with the next. |
'replace' |
Replace with a suitable replacement character; Python will use the
official U+FFFD REPLACEMENT CHARACTER for the built-in Unicode codecs on decoding and
'?' on encoding. |
'xmlcharrefreplace' |
Replace with the appropriate XML character reference (only for
encoding). |
'backslashreplace' |
Replace with backslashed escape sequences (only for encoding). |
The set of allowed values can be extended via register_error.
|