FAQ
Dear tutors,

I use ElementTree for XML works. I have a 1.3GB file
to parse.


I takes a lot of time to open my input XML file.

Is that because of my hardware limitation or am I
using a blunt method to load the file.

my computer config:
Inte(R)
Pentium(R)4 CPU 2.80GHz
2.79GHz, 0.99GB of RAM

from elementtree import ElementTree
myfile = open('myXML.out','r')

Do you suggest any tip to circumvent the file opening
problem.

thanks
Srini



____________________________________________________________________________________
Be a better sports nut! Let your teams follow you
with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

Search Discussions

  • Kent Johnson at Nov 21, 2007 at 5:21 pm

    Srinivas Iyyer wrote:
    Dear tutors,

    I use ElementTree for XML works. I have a 1.3GB file
    to parse.


    I takes a lot of time to open my input XML file.

    Is that because of my hardware limitation or am I
    using a blunt method to load the file.

    my computer config:
    Inte(R)
    Pentium(R)4 CPU 2.80GHz
    2.79GHz, 0.99GB of RAM

    from elementtree import ElementTree
    myfile = open('myXML.out','r')
    Reading a 1.3 GB file on a machine with .99 GB RAM is certainly pushing
    things. To parse it into an ElementTree will probably double or triple
    your memory requirements.
    Do you suggest any tip to circumvent the file opening
    problem.
    Do you need the whole parsed tree at once or can you process it a little
    at a time? If not, maybe this will help:
    http://effbot.org/zone/element-iterparse.htm

    Kent
  • Dave Kuhlman at Nov 21, 2007 at 5:54 pm

    On Wed, Nov 21, 2007 at 09:02:47AM -0800, Srinivas Iyyer wrote:
    Dear tutors,

    I use ElementTree for XML works. I have a 1.3GB file
    to parse.


    I takes a lot of time to open my input XML file.

    Is that because of my hardware limitation or am I
    using a blunt method to load the file.

    my computer config:
    Inte(R)
    Pentium(R)4 CPU 2.80GHz
    2.79GHz, 0.99GB of RAM

    from elementtree import ElementTree
    myfile = open('myXML.out','r')

    Do you suggest any tip to circumvent the file opening
    problem.
    If time is the problem, you might want to look at:

    - cElementTree -- See notes about cElementTree on this page:
    http://effbot.org/zone/elementtree-13-intro.htm

    - lxml -- http://codespeak.net/lxml/

    If size/resources/memory are the issue, as must be the case for
    you, then SAX can be a solution. But, switching to SAX requires a
    very radical redesign of your application.

    You might also want to investigate pulldom. It's in the Python
    standard library. A quote:

    "PullDOM has 80% of the speed of SAX and 80% of the convenience
    of the DOM. There are still circumstances where you might need
    SAX (speed freak!) or DOM (complete random access). But IMO
    there are a lot more circumstances where the PullDOM middle
    ground is exactly what you need."

    The Python standard documentation on pulldom is next to none, but
    here are several links:

    http://www.prescod.net/python/pulldom.html
    http://www.ibm.com/developerworks/xml/library/x-tipulldom.html
    http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html
    http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html#pull

    Hope this helps.

    Dave
  • Kent Johnson at Nov 21, 2007 at 6:18 pm

    Srinivas Iyyer wrote:
    Dear tutors,

    I use ElementTree for XML works. I have a 1.3GB file
    to parse.


    I takes a lot of time to open my input XML file.

    Is that because of my hardware limitation or am I
    using a blunt method to load the file.

    my computer config:
    Inte(R)
    Pentium(R)4 CPU 2.80GHz
    2.79GHz, 0.99GB of RAM

    from elementtree import ElementTree
    myfile = open('myXML.out','r')
    Reading a 1.3 GB file on a machine with .99 GB RAM is certainly pushing
    things. To parse it into an ElementTree will probably double or triple
    your memory requirements.
    Do you suggest any tip to circumvent the file opening
    problem.
    Do you need the whole parsed tree at once or can you process it a little
    at a time? If not, maybe this will help:
    http://effbot.org/zone/element-iterparse.htm

    Kent

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouptutor @
categoriespython
postedNov 21, '07 at 5:02p
activeNov 21, '07 at 6:18p
posts4
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase