FAQ
I'm using feedparser to parse the following:

<div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
you to her HomeFinderResource.com TM A "MUST See &amp;hellip;</div>

I'm receiveing the following error when i try to print the feedparser
parsing of the above text:

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
position 86: ordinal not in range(256)

Why is this happening and where does the problem lie?

thanks

Search Discussions

  • Deelan at Jun 7, 2005 at 10:22 am

    fingermark at gmail.com wrote:
    I'm using feedparser to parse the following:

    <div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
    you to her HomeFinderResource.com TM A "MUST See &amp;hellip;</div>

    I'm receiveing the following error when i try to print the feedparser
    parsing of the above text:

    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
    position 86: ordinal not in range(256)

    Why is this happening and where does the problem lie?
    it seems that the unicode character 0x201c isn't part
    of the latin-1 charset, see:

    "LEFT DOUBLE QUOTATION MARK"
    <http://www.fileformat.info/info/unicode/char/201c/index.htm>

    try to encode the feedparser output to UTF-8 instead, or
    use the "replace" option for the encode() method.
    c = u'\u201c'
    c
    u'\u201c'
    c.encode('utf-8')
    '\xe2\x80\x9c'
    print c.encode('utf-8')
    ok, let's try replace
    c.encode('latin-1', 'replace')
    '?'

    using "replace" will not throw an error, but it will replace
    the offending characther with a question mark.

    HTH.
  • Fingermark at Jun 7, 2005 at 8:11 pm
    why is it even trying latin-1 at all? I don't see it anywhere in
    feedparser.py or my code.

    deelan wrote:
    fingermark at gmail.com wrote:
    I'm using feedparser to parse the following:

    <div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
    you to her HomeFinderResource.com TM A "MUST See &amp;hellip;</div>

    I'm receiveing the following error when i try to print the feedparser
    parsing of the above text:

    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
    position 86: ordinal not in range(256)

    Why is this happening and where does the problem lie?
    it seems that the unicode character 0x201c isn't part
    of the latin-1 charset, see:

    "LEFT DOUBLE QUOTATION MARK"
    <http://www.fileformat.info/info/unicode/char/201c/index.htm>

    try to encode the feedparser output to UTF-8 instead, or
    use the "replace" option for the encode() method.
    c = u'\u201c'
    c
    u'\u201c'
    c.encode('utf-8')
    '\xe2\x80\x9c'
    print c.encode('utf-8')
    ok, let's try replace
    c.encode('latin-1', 'replace')
    '?'

    using "replace" will not throw an error, but it will replace
    the offending characther with a question mark.

    HTH.

    --
    deelan <http://www.deelan.com/>
  • Jarek Zgoda at Jun 7, 2005 at 8:35 pm

    fingermark at gmail.com napisa?(a):

    why is it even trying latin-1 at all? I don't see it anywhere in
    feedparser.py or my code.
    Check your site.py or sitecustomize.py module, you can have non-standard
    default encoding set there.
  • John Roth at Jun 7, 2005 at 9:32 pm
    <fingermark at gmail.com> wrote in message
    news:1118135690.961381.207490 at o13g2000cwo.googlegroups.com...
    I'm using feedparser to parse the following:

    <div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
    you to her HomeFinderResource.com TM A "MUST See &amp;hellip;</div>

    I'm receiveing the following error when i try to print the feedparser
    parsing of the above text:

    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
    position 86: ordinal not in range(256)

    Why is this happening and where does the problem lie?
    Several different things are going on here. First, when you try to
    print a unicode string using str() or a similar function, Python is going to
    use the default encoding to render it. The default encoding is usually
    ASCII-7. Why it's trying to use Latin-1 in this case is somewhat
    of a mystery.

    The quote in front of the word MUST is a "smart quote", that is a
    curly quote, and it is not a valid character in either ASCII or
    Latin-1. Use Windows-1252 explicitly, and it should render
    properly. Alternatively use UTF-8, as one of the other posters
    suggested. Then it's up to whatever software you use to actually
    put the ink on the paper to render it properly, but that's a different
    issue.

    John Roth
    thanks
  • Kent Johnson at Jun 8, 2005 at 12:44 am

    John Roth wrote:
    <fingermark at gmail.com> wrote in message
    news:1118135690.961381.207490 at o13g2000cwo.googlegroups.com...
    I'm using feedparser to parse the following:

    <div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
    you to her HomeFinderResource.com TM A "MUST See &amp;hellip;</div>

    I'm receiveing the following error when i try to print the feedparser
    parsing of the above text:

    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
    position 86: ordinal not in range(256)

    Why is this happening and where does the problem lie?

    Several different things are going on here. First, when you try to
    print a unicode string using str() or a similar function, Python is
    going to
    use the default encoding to render it. The default encoding is usually
    ASCII-7. Why it's trying to use Latin-1 in this case is somewhat
    of a mystery.
    Actually I believe it will use sys.stdout.encoding for this, which is presumably latin-1 on fingermark's machine.

    Kent

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJun 7, '05 at 10:02a
activeJun 8, '05 at 12:44a
posts6
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase