Jukka Aho wrote:
When converting Unicode strings to legacy character encodings, it is
possible to register a custom error handler that will catch and process
all code points that do not have a direct equivalent in the target
encoding (as described in PEP 293).

The thing to note here is that the error handler itself is required to
return the substitutions as Unicode strings - not as the target encoding
bytestrings. Some lower-level gadgetry will silently convert these
strings to the target encoding.

That is, if the substitution _itself_ doesn't contain illegal code
points for the target encoding.

Which brings us to the point: if my error handler for some reason
returns illegal substitutions (from the viewpoint of the target
encoding), how can I catch _these_ errors and make things good again?

I thought it would work automatically, by calling the error handler as
many times as necessary, and letting it work out the situation, but it
apparently doesn't. Sample code follows:

# So the question becomes: how can I make this work
# in a graceful manner?
change the return statement with this code:

return (substitution.encode(error.encoding,"practical").decode(
error.encoding), error.start+1)

-- Serge

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
grouppython-list @
postedMar 12, '06 at 7:56p
activeMar 14, '06 at 2:36p

2 users in discussion

Jukka Aho: 2 posts Serge Orlov: 1 post



site design / logo © 2018 Grokbase