FAQ
Hi,

I'm trying to parse an xml file using SAX. About half-way through a
file I get this error:

Traceback (most recent call last):
File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 325, in RunScript
exec codeObject in __main__.__dict__
File "E:\sc\b2.py", line 58, in <module>
parser.parse(open(r'ppb5.xml'))
File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
end_element
self._cont_handler.endElement(name)
File "E:\sc\b2.py", line 51, in endElement
d.write(csv+"\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position
146-147: ordinal not in range(128)

I'm using ActivePython 2.6. I trying to figure out the simplest fix.
If there's a Python way to just take the source XML file and covert/
process it so this will not happen - that would be best. Or should I
just update to Python 3 ?

I tried this but nothing changed, I thought this might convert it and
then I'd paerse the new file - didn't work:

uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
ascii = uc.decode('ascii')
mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
mex9.write(ascii)

Again I'm looking for something simple even it's a few more lines of
codes...or upgrade(?)

Thanks, appreciate any help.
mex9.close()

Search Discussions

  • Goldtech at Nov 30, 2010 at 8:43 pm
    Hi,

    I'm trying to parse an xml file using SAX. About half-way through a
    file I get this error:

    Traceback (most recent call last):
    File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
    \scriptutils.py", line 325, in RunScript
    exec codeObject in __main__.__dict__
    File "E:\sc\b2.py", line 58, in <module>
    parser.parse(open(r'ppb5.xml'))
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
    File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
    self.feed(buffer)
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
    end_element
    self._cont_handler.endElement(name)
    File "E:\sc\b2.py", line 51, in endElement
    d.write(csv+"\n")
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
    146-147: ordinal not in range(128)

    I'm using ActivePython 2.6. I trying to figure out the simplest fix.
    If there's a Python way to just take the source XML file and covert/
    process it so this will not happen - that would be best. Or should I
    just update to Python 3 ?

    I tried this but nothing changed, I thought this might convert it and
    then I'd paerse the new file - didn't work:

    uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
    ascii = uc.decode('ascii')
    mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
    mex9.write(ascii)

    Again I'm looking for something simple even it's a few more lines of
    codes...or upgrade(?)

    Thanks, appreciate any help.
    mex9.close()
  • Steve Holden at Nov 30, 2010 at 9:02 pm

    On 11/30/2010 3:43 PM, goldtech wrote:
    Hi,

    I'm trying to parse an xml file using SAX. About half-way through a
    file I get this error:

    Traceback (most recent call last):
    File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
    \scriptutils.py", line 325, in RunScript
    exec codeObject in __main__.__dict__
    File "E:\sc\b2.py", line 58, in <module>
    parser.parse(open(r'ppb5.xml'))
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
    File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
    self.feed(buffer)
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
    end_element
    self._cont_handler.endElement(name)
    File "E:\sc\b2.py", line 51, in endElement
    d.write(csv+"\n")
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
    146-147: ordinal not in range(128)

    I'm using ActivePython 2.6. I trying to figure out the simplest fix.
    If there's a Python way to just take the source XML file and covert/
    process it so this will not happen - that would be best. Or should I
    just update to Python 3 ?

    I tried this but nothing changed, I thought this might convert it and
    then I'd paerse the new file - didn't work:

    uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
    ascii = uc.decode('ascii')
    mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
    mex9.write(ascii)

    Again I'm looking for something simple even it's a few more lines of
    codes...or upgrade(?)

    Thanks, appreciate any help.
    mex9.close()
    I'm just as stumped as I was when you first asked this question 13
    minutes ago. ;-)

    regards
    Steve

    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    PyCon 2011 Atlanta March 9-17 http://us.pycon.org/
    See Python Video! http://python.mirocommunity.org/
    Holden Web LLC http://www.holdenweb.com/
  • Goldtech at Nov 30, 2010 at 9:15 pm
    snip...
    I'm just as stumped as I was when you first asked this question 13
    minutes ago. ;-)

    regards
    ?Steve
    snip...

    Hi Steve,

    Think I found it, for example:

    line = 'my big string'
    line.encode('ascii', 'ignore')

    I processed the problem strings during parsing with this and it works
    now. Got this from:

    http://stackoverflow.com/questions/2365411/python-convert-unicode-to-ascii-without-errors


    Best, Lee

    :^)
  • Stefan Behnel at Dec 1, 2010 at 7:55 am

    goldtech, 30.11.2010 22:15:
    Think I found it, for example:

    line = 'my big string'
    line.encode('ascii', 'ignore')

    I processed the problem strings during parsing with this and it works
    now.
    That's not the right way of dealing with encodings, though. You should open
    the file with a well defined encoding (using codecs.open() or io.open() in
    Python >= 2.6), and then write the unicode strings into it just as you get
    them.

    Stefan
  • Ulrich Eckhardt at Dec 1, 2010 at 8:57 am

    goldtech wrote:
    I tried this but nothing changed, I thought this might convert it and
    then I'd paerse the new file - didn't work:

    uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
    ascii = uc.decode('ascii')
    mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
    mex9.write(ascii)
    This doesn't make sense either. decode() will convert bytes into (Unicode)
    characters. After the first decode('utf8'), you have those already. Calling
    decode('ascii') on that doesn't make sense. If you want ASCII, as the
    assignee suggests, you need to _encode_ the string. Be aware that not all
    characters can be represented as ASCII though, and the presence of such a
    character seems to have caused your initial problem.

    BTW:
    - XML is not necessarily UTF-8, but that's a different issue.
    - I would suggest you open files with 'rb' or 'wb' in order to suppress any
    conversions on line endings. Especially writing UTF-16 would fail if that
    is active.

    Good luck!

    Uli

    --
    Domino Laser GmbH
    Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932
  • Justin Ezequiel at Nov 30, 2010 at 9:20 pm
    can't check right now but are you sure it's the parser and not
    this line
    d.write(csv+"\n")
    that's failing?
    what is d?
  • Adam Tauno Williams at Dec 1, 2010 at 1:33 pm

    On Tue, 2010-11-30 at 12:28 -0800, goldtech wrote:
    I'm trying to parse an xml file using SAX. About half-way through a
    file I get this error:
    Traceback (most recent call last):
    File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
    \scriptutils.py", line 325, in RunScript
    exec codeObject in __main__.__dict__
    File "E:\sc\b2.py", line 58, in <module>
    parser.parse(open(r'ppb5.xml'))
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
    File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
    self.feed(buffer)
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
    File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
    end_element
    self._cont_handler.endElement(name)
    File "E:\sc\b2.py", line 51, in endElement
    d.write(csv+"\n")
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
    146-147: ordinal not in range(128)
    Catch the UnicodeEncodeError exception and display the value of csv.

    Are you certain the error isn't actually in your data? What encoding is
    the source data?

    What is "d"? A file object? Is it in binary mode, or is it StringIO,
    or a codec?
    I'm using ActivePython 2.6. I trying to figure out the simplest fix.
    If there's a Python way to just take the source XML file and covert/
    process it so this will not happen - that would be best. Or should I
    just update to Python 3 ?
    I tried this but nothing changed, I thought this might convert it and
    then I'd paerse the new file - didn't work:
    u = open(r'E:\sc\ppb4.xml').read().decode('utf8')
    ascii = uc.decode('ascii')
    mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
    mex9.write(ascii)
    Again I'm looking for something simple even it's a few more lines of
    codes...or upgrade(?)
    If the input data contains characters that cannot be represented in
    ASCII simply decoding the stream (a) won't fix it and (b) should raise
    an exception.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedNov 30, '10 at 8:28p
activeDec 1, '10 at 1:33p
posts8
users6
websitepython.org

People

Translate

site design / logo © 2022 Grokbase