Hi Everyone,
Thanks for all your suggestions. I read up on gzip and urllib and also
learned in the process that I could use urllib2 as its the latest form of
that library.
Herewith my solution: I don't know how elegant it is, but it works just
fine.
def get_contests():
url = '
http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po'
req = urllib2.Request(url)
req.add_header('accept-encoding','gzip/deflate')
opener = urllib2.build_opener()
response = opener.open(req)
compressed_data = response.read()
compressed_stream = StringIO.StringIO(compressed_data)
gzipper = gzip.GzipFile(fileobj=compressed_stream)
data = gzipper.read()
current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
data_file = open(current_path, 'w')
data_file.write(data)
data_file.close()
xml_data = ET.parse(open(current_path, 'r'))
contest_list = []
for contest_parent_node in xml_data.getiterator('contest'):
contest = Contest()
for contest_child_node in contest_parent_node:
if (contest_child_node.tag == "name" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.name = contest_child_node.text
if (contest_child_node.tag == "league" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.league = contest_child_node.text
if (contest_child_node.tag == "acro" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.acro = contest_child_node.text
if (contest_child_node.tag == "time" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.time = contest_child_node.text
if (contest_child_node.tag == "home" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.home = contest_child_node.text
if (contest_child_node.tag == "away" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.away = contest_child_node.text
contest_list.append(contest)
try:
os.remove(current_path)
except:
pass
return contest_list
Many thanks!
On Tue, May 24, 2011 at 12:35 PM, Stefan Behnel wrote:Sithembewena Lloyd Dube, 24.05.2011 11:59:
I am trying to parse an XML feed and display the text of each child node
Well, yes, it does return an XML document, but not what you expect:
urllib.urlopen('URL see above').read()
"<response>\r\n <error-message>you must add 'accept-encoding' as
'gzip,deflate' to the header of your request</error-message>\r
\n</response>"
Meaning, the server forces you to pass an HTTP header to the request in
order to receive gzip compressed data. Once you have that, you must
decompress it before passing it into ElementTree's parser. See the
documentation on the gzip and urllib modules in the standard library.
Stefan
_______________________________________________
Tutor maillist - Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor --
Regards,
Sithembewena Lloyd Dube
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://mail.python.org/pipermail/tutor/attachments/20110525/adc2c982/attachment.html>