FAQ
I have a very large (200Mb) XML file that consists of multiple records. I
would like to split these records up and store the XML for each in a
database for quick retrieval. I simply need to echo all of the XML between
the enclosing record tags into the database. Ideally, I would use SAX to
parse things, but I can't figure out how to echo the data back out exactly
as I got it. Any clues?

Thanks,
Sean

Search Discussions

  • Hanson, Rob at Jul 22, 2004 at 11:06 pm
    Ideally, I would use SAX to parse things
    Optionally you could look at XML::RAX.

    Article on the RAX concept:
    http://www.xml.com/pub/a/2000/04/26/rax/index.html

    RAX allows you to specify a record seperator (a tag in the XML file), and
    splits into into chunks of that tag. It is stream based so it only reads in
    as much of the file it needs to construct the next record. It only applies
    to XML files that fit that type of format though (like RSS). At the very
    least you might find the code helpful.
    but I can't figure out how to echo the data
    back out exactly as I got it.
    I'm not sure I completely understand. Anyway I am out of here today, hope
    you find an answer.

    Rob


    -----Original Message-----
    From: Sean Davis
    Sent: Thursday, July 22, 2004 5:42 PM
    To: beginners@perl.org
    Subject: splitting large xml file


    I have a very large (200Mb) XML file that consists of multiple records. I
    would like to split these records up and store the XML for each in a
    database for quick retrieval. I simply need to echo all of the XML between
    the enclosing record tags into the database. Ideally, I would use SAX to
    parse things, but I can't figure out how to echo the data back out exactly
    as I got it. Any clues?

    Thanks,
    Sean




    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    <http://learn.perl.org/> <http://learn.perl.org/first-response>
  • Sean Davis at Jul 23, 2004 at 11:53 am
    Rob,

    Thanks for replying. I ended up answering my own question. I used
    XML::Twig to find chunks I was interested in, could grab indexing
    information from the twig, then save the indices in a database for
    later lookup of the entire XML record and...presto, random-access of
    200 Mb of XML!

    Sean
    On Jul 22, 2004, at 7:06 PM, Hanson, Rob wrote:

    Ideally, I would use SAX to parse things
    Optionally you could look at XML::RAX.

    Article on the RAX concept:
    http://www.xml.com/pub/a/2000/04/26/rax/index.html

    RAX allows you to specify a record seperator (a tag in the XML file),
    and
    splits into into chunks of that tag. It is stream based so it only
    reads in
    as much of the file it needs to construct the next record. It only
    applies
    to XML files that fit that type of format though (like RSS). At the
    very
    least you might find the code helpful.
    but I can't figure out how to echo the data
    back out exactly as I got it.
    I'm not sure I completely understand. Anyway I am out of here today,
    hope
    you find an answer.

    Rob


    -----Original Message-----
    From: Sean Davis
    Sent: Thursday, July 22, 2004 5:42 PM
    To: beginners@perl.org
    Subject: splitting large xml file


    I have a very large (200Mb) XML file that consists of multiple
    records. I
    would like to split these records up and store the XML for each in a
    database for quick retrieval. I simply need to echo all of the XML
    between
    the enclosing record tags into the database. Ideally, I would use SAX
    to
    parse things, but I can't figure out how to echo the data back out
    exactly
    as I got it. Any clues?

    Thanks,
    Sean




    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    <http://learn.perl.org/> <http://learn.perl.org/first-response>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedJul 22, '04 at 9:51p
activeJul 23, '04 at 11:53a
posts3
users2
websiteperl.org

2 users in discussion

Sean Davis: 2 posts Hanson, Rob: 1 post

People

Translate

site design / logo © 2021 Grokbase