|
Dave Kuhlman |
at Nov 21, 2007 at 5:54 pm
|
⇧ |
| |
On Wed, Nov 21, 2007 at 09:02:47AM -0800, Srinivas Iyyer wrote:
Dear tutors,
I use ElementTree for XML works. I have a 1.3GB file
to parse.
I takes a lot of time to open my input XML file.
Is that because of my hardware limitation or am I
using a blunt method to load the file.
my computer config:
Inte(R)
Pentium(R)4 CPU 2.80GHz
2.79GHz, 0.99GB of RAM
from elementtree import ElementTree
myfile = open('myXML.out','r')
Do you suggest any tip to circumvent the file opening
problem.
If time is the problem, you might want to look at:
- cElementTree -- See notes about cElementTree on this page:
http://effbot.org/zone/elementtree-13-intro.htm- lxml --
http://codespeak.net/lxml/If size/resources/memory are the issue, as must be the case for
you, then SAX can be a solution. But, switching to SAX requires a
very radical redesign of your application.
You might also want to investigate pulldom. It's in the Python
standard library. A quote:
"PullDOM has 80% of the speed of SAX and 80% of the convenience
of the DOM. There are still circumstances where you might need
SAX (speed freak!) or DOM (complete random access). But IMO
there are a lot more circumstances where the PullDOM middle
ground is exactly what you need."
The Python standard documentation on pulldom is next to none, but
here are several links:
http://www.prescod.net/python/pulldom.htmlhttp://www.ibm.com/developerworks/xml/library/x-tipulldom.htmlhttp://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.htmlhttp://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html#pullHope this helps.
Dave