FAQ
Hi!

I need to parse several XML documents into a Python dictionary. Is there a module that would be particularly good for this? I heard beginners should start with ElementTree. However, SAX seems to make a little more sense to me. Any suggestions?

Search Discussions

  • Eric Pavey at Nov 10, 2009 at 4:15 am

    On Mon, Nov 9, 2009 at 7:48 PM, Christopher Spears wrote:

    Hi!

    I need to parse several XML documents into a Python dictionary. Is there a
    module that would be particularly good for this? I heard beginners should
    start with ElementTree. However, SAX seems to make a little more sense to
    me. Any suggestions?
    I'd recommend ElementTree. I started out with minidom and wanted to rip my
    face off. Ok, exaggerating, but ElementTree made a lot more sense.
    2c
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/tutor/attachments/20091109/da0d9c8b/attachment-0001.htm>
  • Alan Gauld at Nov 10, 2009 at 5:53 am
    "Christopher Spears" <cspears2002 at yahoo.com> wrote
    I need to parse several XML documents into a Python dictionary.
    Is there a module that would be particularly good for this?
    I heard beginners should start with ElementTree.
    However, SAX seems to make a little more sense to me.
    XML parsers fall into 2 groups. Those that parse the whole
    structure and create a tree of objects - usually accessed like
    a dictionary, and those that parse line by line looking for patterns.
    ElementTree is of the former, sax of the latter.

    The former approach is usually slightly slower and more resource
    hungry but is much more flexible. SAX is fast but generally best
    if you only want to read something specific out of the XML.

    If SAX makes sense for you and meets your needs go with it.

    But ElementTree is worth persevering with if you need to do
    more complex editing of the XML. Its certainly easier than minidom.
    (The other standard tree parser in Python)

    Alan G.
  • Stefan Behnel at Nov 10, 2009 at 10:19 am

    Alan Gauld, 10.11.2009 06:53:
    "Christopher Spears" <cspears2002 at yahoo.com> wrote
    I need to parse several XML documents into a Python dictionary. Is
    there a module that would be particularly good for this? I heard
    beginners should start with ElementTree. However, SAX seems to make a
    little more sense to me.
    Note that ElementTree provides both a SAX-like interface (look for the
    'target' property of parsers) and an incremental parser (iterparse). So the
    question is not "ElementTree or SAX?", it's more like "how much time do I
    have to implement, run and maintain the code?".

    XML parsers fall into 2 groups. Those that parse the whole structure and
    create a tree of objects - usually accessed like a dictionary, and those
    that parse line by line looking for patterns.
    Except that parsing XML is not about lines but about bytes in a stream.

    The former approach is usually slightly slower and more resource hungry
    I'd better leave the judgement about this statement to a benchmark.

    If SAX makes sense for you and meets your needs go with it.
    I'd change this to:

    Unless you really know what you are doing and you have proven in benchmarks
    that SAX is substantially faster for the problem at hand, don't use SAX.

    Stefan
  • Alan Gauld at Nov 10, 2009 at 11:09 pm
    "Stefan Behnel" <stefan_ml at behnel.de> wrote
    Note that ElementTree provides both a SAX-like interface (look for the
    'target' property of parsers) and an incremental parser (iterparse).
    Interesting, I didn't realise that.
    I've only ever used it to build a tree.
    XML parsers fall into 2 groups. Those that parse the whole structure and
    create a tree of objects - usually accessed like a dictionary, and those
    that parse line by line looking for patterns.
    Except that parsing XML is not about lines but about bytes in a stream.
    Indeed, I should probably have said element by element.
    The former approach is usually slightly slower and more resource hungry
    I'd better leave the judgement about this statement to a benchmark.
    It depends on what you are doing obviously. If you need to parse the whole
    message therer will be very little difference, but a sax style parser often
    can complete its job after reading a short section of the document.
    Tree parsers generally require the whole document to be completed
    to finish building the tree.
    If SAX makes sense for you and meets your needs go with it.
    I'd change this to:

    Unless you really know what you are doing and you have proven in
    benchmarks
    that SAX is substantially faster for the problem at hand, don't use SAX.
    Even if speed is not the critical factor, if sax makes more sense to you
    that ElementTree, and it will do what you want use sax. Theer are plenty
    industrial
    strength applications using sax parsers, and if the requirement is simple
    it is
    no harder to maintain than a poorly understood ElementTree implementation!

    Personally I find ElementTree easier to work with, but if the OP prefers
    sax
    and can make it work for him then there is nothing wrong with using it.

    And familiarity with sax style parsers is arguably a more transferrable
    skill than ElementTree should he need to work with C or Pascal - or
    even Java etc

    --
    Alan Gauld
    Author of the Learn to Program web site
    http://www.alan-g.me.uk/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouptutor @
categoriespython
postedNov 10, '09 at 3:48a
activeNov 10, '09 at 11:09p
posts5
users4
websitepython.org

People

Translate

site design / logo © 2023 Grokbase