When encoding a Unicode string into a byte string, unencodable characters may be
encountered. So far, Python has allowed specifying the error processing as either
``strict'' (raising UnicodeError), ``ignore'' (skipping the
character), or ``replace'' (using a question mark in the output string), with ``strict''
being the default behavior. It may be desirable to specify alternative processing of such
errors, such as inserting an XML character reference or HTML entity reference into the
Python now has a flexible framework to add different processing strategies. New error
handlers can be added with codecs.register_error, and codecs
then can access the error handler with codecs.lookup_error. An
equivalent C API has been added for codecs written in C. The error handler gets the
necessary state information such as the string being converted, the position in the string
where the error was detected, and the target encoding. The handler can then either raise
an exception or return a replacement string.
Two additional error handlers have been implemented using this framework: ``backslashreplace''
uses Python backslash quoting to represent unencodable characters and ``xmlcharrefreplace''
emits XML character references.