Grokbase Groups Python tutor May 2011
FAQ
Hi Everyone,

I am trying to parse an XML feed and display the text of each child node
without any success. My code in the python shell is as follows:
import urllib
from xml.etree import ElementTree as ET
content = urllib.urlopen('
http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
')
xml_content = ET.parse(content)
I then check the xml_content object as follows:
xml_content
<xml.etree.ElementTree.ElementTree instance at 0x01DC14B8>

And now, to iterate through its child nodes and print out the text of each
node:
for node in xml_content.getiterator('contest'):
... name = node.attrib.get('text')
... print name
...
>>>

Nothing is printed, even though the document does have 'contest' tags with
text in them. If I try to count the contest tags and increment an integer
(to see that the document is traversed) I get the same result - the int
remains at 0.
i = 0
for node in xml_content.getiterator('contest'):
... i += 1
...
i

What am I getting wrong? Any hints would be appreciated.

--
Regards,
Sithembewena Lloyd Dube
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110524/de57896a/attachment-0001.html>

Search Discussions

  • Alan Gauld at May 24, 2011 at 10:20 am
    "Sithembewena Lloyd Dube" <zebra05 at gmail.com> wrote

    And now, to iterate through its child nodes and print out the text
    of each
    node:
    for node in xml_content.getiterator('contest'):
    ... name = node.attrib.get('text')
    ... print name
    ...
    Nothing is printed,
    i = 0
    for node in xml_content.getiterator('contest'):
    ... i += 1
    ...
    i

    What am I getting wrong? Any hints would be appreciated.
    Looks like you are getting an empty list back.
    Try printing list(xml_content.getiterator('contest'))

    And if thats empty try checking the case of your tag?
    I'm pretty sure it will be case sensitive?

    HTH,


    --
    Alan Gauld
    Author of the Learn to Program web site
    http://www.alan-g.me.uk/
  • Stefan Behnel at May 24, 2011 at 10:35 am

    Sithembewena Lloyd Dube, 24.05.2011 11:59:
    I am trying to parse an XML feed and display the text of each child node
    without any success. My code in the python shell is as follows:
    import urllib
    from xml.etree import ElementTree as ET
    content = urllib.urlopen('
    http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
    ')
    xml_content = ET.parse(content)
    I then check the xml_content object as follows:
    xml_content
    <xml.etree.ElementTree.ElementTree instance at 0x01DC14B8>
    Well, yes, it does return an XML document, but not what you expect:
    urllib.urlopen('URL see above').read()
    "<response>\r\n <error-message>you must add 'accept-encoding' as
    'gzip,deflate' to the header of your request</error-message>\r
    \n</response>"

    Meaning, the server forces you to pass an HTTP header to the request in
    order to receive gzip compressed data. Once you have that, you must
    decompress it before passing it into ElementTree's parser. See the
    documentation on the gzip and urllib modules in the standard library.

    Stefan
  • Sithembewena Lloyd Dube at May 25, 2011 at 12:40 pm
    Hi Everyone,

    Thanks for all your suggestions. I read up on gzip and urllib and also
    learned in the process that I could use urllib2 as its the latest form of
    that library.

    Herewith my solution: I don't know how elegant it is, but it works just
    fine.

    def get_contests():
    url = '
    http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
    '
    req = urllib2.Request(url)
    req.add_header('accept-encoding','gzip/deflate')
    opener = urllib2.build_opener()
    response = opener.open(req)
    compressed_data = response.read()
    compressed_stream = StringIO.StringIO(compressed_data)
    gzipper = gzip.GzipFile(fileobj=compressed_stream)
    data = gzipper.read()
    current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
    data_file = open(current_path, 'w')
    data_file.write(data)
    data_file.close()
    xml_data = ET.parse(open(current_path, 'r'))
    contest_list = []
    for contest_parent_node in xml_data.getiterator('contest'):
    contest = Contest()
    for contest_child_node in contest_parent_node:
    if (contest_child_node.tag == "name" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.name = contest_child_node.text
    if (contest_child_node.tag == "league" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.league = contest_child_node.text
    if (contest_child_node.tag == "acro" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.acro = contest_child_node.text
    if (contest_child_node.tag == "time" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.time = contest_child_node.text
    if (contest_child_node.tag == "home" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.home = contest_child_node.text
    if (contest_child_node.tag == "away" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.away = contest_child_node.text
    contest_list.append(contest)
    try:
    os.remove(current_path)
    except:
    pass
    return contest_list

    Many thanks!
    On Tue, May 24, 2011 at 12:35 PM, Stefan Behnel wrote:

    Sithembewena Lloyd Dube, 24.05.2011 11:59:

    I am trying to parse an XML feed and display the text of each child node
    without any success. My code in the python shell is as follows:
    import urllib
    from xml.etree import ElementTree as ET
    content = urllib.urlopen('
    http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
    ')
    xml_content = ET.parse(content)
    I then check the xml_content object as follows:
    xml_content
    <xml.etree.ElementTree.ElementTree instance at 0x01DC14B8>
    Well, yes, it does return an XML document, but not what you expect:
    urllib.urlopen('URL see above').read()
    "<response>\r\n <error-message>you must add 'accept-encoding' as
    'gzip,deflate' to the header of your request</error-message>\r
    \n</response>"

    Meaning, the server forces you to pass an HTTP header to the request in
    order to receive gzip compressed data. Once you have that, you must
    decompress it before passing it into ElementTree's parser. See the
    documentation on the gzip and urllib modules in the standard library.

    Stefan


    _______________________________________________
    Tutor maillist - Tutor at python.org
    To unsubscribe or change subscription options:
    http://mail.python.org/mailman/listinfo/tutor


    --
    Regards,
    Sithembewena Lloyd Dube
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/tutor/attachments/20110525/adc2c982/attachment.html>
  • Stefan Behnel at May 25, 2011 at 1:10 pm

    Sithembewena Lloyd Dube, 25.05.2011 14:40:
    Thanks for all your suggestions. I read up on gzip and urllib and also
    learned in the process that I could use urllib2 as its the latest form of
    that library.

    Herewith my solution: I don't know how elegant it is, but it works just
    fine.

    def get_contests():
    url = '
    http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
    '
    req = urllib2.Request(url)
    req.add_header('accept-encoding','gzip/deflate')
    opener = urllib2.build_opener()
    response = opener.open(req)
    This is ok.

    compressed_data = response.read()
    compressed_stream = StringIO.StringIO(compressed_data)
    gzipper = gzip.GzipFile(fileobj=compressed_stream)
    data = gzipper.read()
    This should be simplifiable to

    uncompressed_stream = gzip.GzipFile(fileobj=response)

    current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
    data_file = open(current_path, 'w')
    data_file.write(data)
    data_file.close()
    xml_data = ET.parse(open(current_path, 'r'))
    And this subsequently becomes

    xml_data = ET.parse(uncompressed_stream)

    contest_list = []
    for contest_parent_node in xml_data.getiterator('contest'):
    Take a look at ET.iterparse().

    contest = Contest()
    for contest_child_node in contest_parent_node:
    if (contest_child_node.tag == "name" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.name = contest_child_node.text
    if (contest_child_node.tag == "league" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.league = contest_child_node.text
    if (contest_child_node.tag == "acro" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.acro = contest_child_node.text
    if (contest_child_node.tag == "time" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.time = contest_child_node.text
    if (contest_child_node.tag == "home" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.home = contest_child_node.text
    if (contest_child_node.tag == "away" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.away = contest_child_node.text
    This is screaming for a simplification, such as

    for child in contest_parent_node:
    if child.tag in ('name', 'league', ...): # etc.
    if child.text:
    setattr(context, child.tag, child.text)


    Stefan
  • Sithembewena Lloyd Dube at Jun 10, 2011 at 2:59 pm
    Hi Stefan,

    Thanks for the code review :) Only just noticed this.
    On Wed, May 25, 2011 at 3:10 PM, Stefan Behnel wrote:

    Sithembewena Lloyd Dube, 25.05.2011 14:40:

    Thanks for all your suggestions. I read up on gzip and urllib and also
    learned in the process that I could use urllib2 as its the latest form of
    that library.

    Herewith my solution: I don't know how elegant it is, but it works just
    fine.

    def get_contests():
    url = '

    http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
    '
    req = urllib2.Request(url)
    req.add_header('accept-encoding','gzip/deflate')
    opener = urllib2.build_opener()
    response = opener.open(req)
    This is ok.



    compressed_data = response.read()
    compressed_stream = StringIO.StringIO(compressed_data)
    gzipper = gzip.GzipFile(fileobj=compressed_stream)
    data = gzipper.read()
    This should be simplifiable to

    uncompressed_stream = gzip.GzipFile(fileobj=response)



    current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
    data_file = open(current_path, 'w')
    data_file.write(data)
    data_file.close()
    xml_data = ET.parse(open(current_path, 'r'))
    And this subsequently becomes

    xml_data = ET.parse(uncompressed_stream)



    contest_list = []
    for contest_parent_node in xml_data.getiterator('contest'):
    Take a look at ET.iterparse().



    contest = Contest()
    for contest_child_node in contest_parent_node:
    if (contest_child_node.tag == "name" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.name = contest_child_node.text
    if (contest_child_node.tag == "league" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.league = contest_child_node.text
    if (contest_child_node.tag == "acro" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.acro = contest_child_node.text
    if (contest_child_node.tag == "time" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.time = contest_child_node.text
    if (contest_child_node.tag == "home" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.home = contest_child_node.text
    if (contest_child_node.tag == "away" and
    contest_child_node.text is not None and contest_child_node.text != ""):
    contest.away = contest_child_node.text
    This is screaming for a simplification, such as

    for child in contest_parent_node:
    if child.tag in ('name', 'league', ...): # etc.
    if child.text:
    setattr(context, child.tag, child.text)



    Stefan

    _______________________________________________
    Tutor maillist - Tutor at python.org
    To unsubscribe or change subscription options:
    http://mail.python.org/mailman/listinfo/tutor


    --
    Regards,
    Sithembewena Lloyd Dube
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/tutor/attachments/20110610/e53ab61c/attachment.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouptutor @
categoriespython
postedMay 24, '11 at 9:59a
activeJun 10, '11 at 2:59p
posts6
users3
websitepython.org

People

Translate

site design / logo © 2023 Grokbase