FAQ
Hi,

I'm using python 3.2 and got the following error:
nntpClient = nntplib.NNTP_SSL(...)
nntpClient.group("alt.binaries.cd.lossless")
nntpClient.over((534157,534157))
... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 1995
[02/41] "Back.jpg" yEnc (1/3)' ...
overview = nntpClient.over((534157,534157))
print(overview[1][0][1]['subject'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
position 3: surrogates not allowed

I'm not sure if I should report this as a bug in nntplib or if I'm
doing something wrong.

Note that I get the same error if I try to write this data to a file:
h = open("output.txt", "a")
h.write(overview[1][0][1]['subject'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
position 3: surrogates not allowed

Thanks,
Laurent

Search Discussions

  • MRAB at Feb 28, 2011 at 2:12 am

    On 28/02/2011 01:31, Laurent Duchesne wrote:
    Hi,

    I'm using python 3.2 and got the following error:
    nntpClient = nntplib.NNTP_SSL(...)
    nntpClient.group("alt.binaries.cd.lossless")
    nntpClient.over((534157,534157))
    ... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 1995
    [02/41] "Back.jpg" yEnc (1/3)' ...
    overview = nntpClient.over((534157,534157))
    print(overview[1][0][1]['subject'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
    position 3: surrogates not allowed

    I'm not sure if I should report this as a bug in nntplib or if I'm doing
    something wrong.

    Note that I get the same error if I try to write this data to a file:
    h = open("output.txt", "a")
    h.write(overview[1][0][1]['subject'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
    position 3: surrogates not allowed
    It's looks like the subject was originally encoded as Latin-1 (or
    similar) (b'Myl\xe8ne Farmer - Anamorphosee (Japan Edition) 1995
    [02/41] "Back.jpg" yEnc (1/3)') but has been decoded as UTF-8 with
    "surrogateescape" passed as the "errors" parameter.

    You can get the "correct" Unicode by encoding as UTF-8 with
    "surrogateescape" and then decoding as Latin-1:

    overview[1][0][1]['subject'].encode("utf-8",
    "surrogateescape").decode("latin-1")
  • Thomas L. Shinnick at Feb 28, 2011 at 3:26 am

    At 08:12 PM 2/27/2011, you wrote:
    On 28/02/2011 01:31, Laurent Duchesne wrote:
    Hi,

    I'm using python 3.2 and got the following error:
    nntpClient = nntplib.NNTP_SSL(...)
    nntpClient.group("alt.binaries.cd.lossless")
    nntpClient.over((534157,534157))
    ... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 1995
    [02/41] "Back.jpg" yEnc (1/3)' ...
    overview = nntpClient.over((534157,534157))
    print(overview[1][0][1]['subject'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
    position 3: surrogates not allowed

    I'm not sure if I should report this as a bug in nntplib or if I'm doing
    something wrong.

    Note that I get the same error if I try to write this data to a file:
    h = open("output.txt", "a")
    h.write(overview[1][0][1]['subject'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
    position 3: surrogates not allowed
    It's looks like the subject was originally encoded as Latin-1 (or
    similar) (b'Myl\xe8ne Farmer - Anamorphosee (Japan Edition) 1995
    [02/41] "Back.jpg" yEnc (1/3)') but has been decoded as UTF-8 with
    "surrogateescape" passed as the "errors" parameter.
    3.2 Docs
    6.6. codecs ? Codec registry and base classes
    Possible values for errors are
    'surrogateescape': replace with surrogate U+DCxx, see PEP 383

    Yes, it would have been 0xE8 - Myl?ne

    Googling on surrogateescape I can see lots of
    argument about unintended outcomes.... yikes!
    You can get the "correct" Unicode by encoding as UTF-8 with
    "surrogateescape" and then decoding as Latin-1:


    overview[1][0][1]['subject'].encode("utf-8",
    "surrogateescape").decode("latin-1")
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20110227/f902af15/attachment.html>
  • Laurent Duchesne at Feb 28, 2011 at 5:49 pm
    Hi,

    Thanks it's working!
    But is it "normal" for a string coming out of a module (nntplib) to
    crash when passed to print or write?

    I'm just asking to know if I should open a bug report or not :)

    I'm also wondering which strings should be re-encoded using the
    surrogateescape parameter and which should not.. I guess I could
    reencode them all and it wouldn't cause any problems?

    Laurent
    On Mon, 28 Feb 2011 02:12:20 +0000, MRAB wrote:
    On 28/02/2011 01:31, Laurent Duchesne wrote:
    Hi,

    I'm using python 3.2 and got the following error:
    nntpClient = nntplib.NNTP_SSL(...)
    nntpClient.group("alt.binaries.cd.lossless")
    nntpClient.over((534157,534157))
    ... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition)
    1995
    [02/41] "Back.jpg" yEnc (1/3)' ...
    overview = nntpClient.over((534157,534157))
    print(overview[1][0][1]['subject'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
    position 3: surrogates not allowed

    I'm not sure if I should report this as a bug in nntplib or if I'm
    doing
    something wrong.

    Note that I get the same error if I try to write this data to a
    file:
    h = open("output.txt", "a")
    h.write(overview[1][0][1]['subject'])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
    position 3: surrogates not allowed
    It's looks like the subject was originally encoded as Latin-1 (or
    similar) (b'Myl\xe8ne Farmer - Anamorphosee (Japan Edition) 1995
    [02/41] "Back.jpg" yEnc (1/3)') but has been decoded as UTF-8 with
    "surrogateescape" passed as the "errors" parameter.

    You can get the "correct" Unicode by encoding as UTF-8 with
    "surrogateescape" and then decoding as Latin-1:

    overview[1][0][1]['subject'].encode("utf-8",
    "surrogateescape").decode("latin-1")

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedFeb 28, '11 at 1:31a
activeFeb 28, '11 at 5:49p
posts4
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase