FAQ
Add tokenize by pair to be able to split big XML files in streaming mode
------------------------------------------------------------------------

Key: CAMEL-4595
URL: https://issues.apache.org/jira/browse/CAMEL-4595
Project: Camel
Issue Type: New Feature
Components: camel-core
Reporter: Claus Ibsen
Assignee: Claus Ibsen
Fix For: 2.9.0


Using XPath to split big XML files is not optimal as the JDK XPath framework doesn't support streaming mode yet. It may come in the future.

So instead we can introduce a tokenizer which can grab the xml content between start/end tokens instead. Then we can parse big files with very low memory footprint.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Raul Kripalani (Commented) (JIRA) at Oct 29, 2011 at 10:03 pm
    [ https://issues.apache.org/jira/browse/CAMEL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139467#comment-13139467 ]

    Raul Kripalani commented on CAMEL-4595:
    ---------------------------------------

    Claus - I'm wondering whether it would make sense to look into levering libraries such as:
    * VTD-XML (http://vtd-xml.sourceforge.net)
    * Nux (http://acs.lbl.gov/software/nux)
    * Aalto (http://wiki.fasterxml.com/AaltoHome)

    What do you think?
    Add tokenize by pair to be able to split big XML files in streaming mode
    ------------------------------------------------------------------------

    Key: CAMEL-4595
    URL: https://issues.apache.org/jira/browse/CAMEL-4595
    Project: Camel
    Issue Type: New Feature
    Components: camel-core
    Reporter: Claus Ibsen
    Assignee: Claus Ibsen
    Fix For: 2.9.0


    Using XPath to split big XML files is not optimal as the JDK XPath framework doesn't support streaming mode yet. It may come in the future.
    So instead we can introduce a tokenizer which can grab the xml content between start/end tokens instead. Then we can parse big files with very low memory footprint.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Raul Kripalani (Issue Comment Edited) (JIRA) at Oct 29, 2011 at 10:57 pm
    [ https://issues.apache.org/jira/browse/CAMEL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139467#comment-13139467 ]

    Raul Kripalani edited comment on CAMEL-4595 at 10/29/11 10:57 PM:
    ------------------------------------------------------------------

    Claus - I'm wondering whether it would make sense to look into leveraging libraries such as:
    * VTD-XML (http://vtd-xml.sourceforge.net)
    * Nux (http://acs.lbl.gov/software/nux)
    * Aalto (http://wiki.fasterxml.com/AaltoHome)

    What do you think?

    was (Author: raulvk):
    Claus - I'm wondering whether it would make sense to look into levering libraries such as:
    * VTD-XML (http://vtd-xml.sourceforge.net)
    * Nux (http://acs.lbl.gov/software/nux)
    * Aalto (http://wiki.fasterxml.com/AaltoHome)

    What do you think?
    Add tokenize by pair to be able to split big XML files in streaming mode
    ------------------------------------------------------------------------

    Key: CAMEL-4595
    URL: https://issues.apache.org/jira/browse/CAMEL-4595
    Project: Camel
    Issue Type: New Feature
    Components: camel-core
    Reporter: Claus Ibsen
    Assignee: Claus Ibsen
    Fix For: 2.9.0


    Using XPath to split big XML files is not optimal as the JDK XPath framework doesn't support streaming mode yet. It may come in the future.
    So instead we can introduce a tokenizer which can grab the xml content between start/end tokens instead. Then we can parse big files with very low memory footprint.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Claus Ibsen (Commented) (JIRA) at Oct 30, 2011 at 9:53 am
    [ https://issues.apache.org/jira/browse/CAMEL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139573#comment-13139573 ]

    Claus Ibsen commented on CAMEL-4595:
    ------------------------------------

    VTD is GPL so we can't host it at Apache
    Nux seems to have incompatible license as well
    And Aalto is indeed Apache licenses by only maintained by one guy and the project don't seem like its taken off.

    Anyway people in the community is of course free to build additional components.

    Add tokenize by pair to be able to split big XML files in streaming mode
    ------------------------------------------------------------------------

    Key: CAMEL-4595
    URL: https://issues.apache.org/jira/browse/CAMEL-4595
    Project: Camel
    Issue Type: New Feature
    Components: camel-core
    Reporter: Claus Ibsen
    Assignee: Claus Ibsen
    Fix For: 2.9.0


    Using XPath to split big XML files is not optimal as the JDK XPath framework doesn't support streaming mode yet. It may come in the future.
    So instead we can introduce a tokenizer which can grab the xml content between start/end tokens instead. Then we can parse big files with very low memory footprint.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Claus Ibsen (Resolved) (JIRA) at Oct 30, 2011 at 10:13 am
    [ https://issues.apache.org/jira/browse/CAMEL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Claus Ibsen resolved CAMEL-4595.
    --------------------------------

    Resolution: Fixed
    Add tokenize by pair to be able to split big XML files in streaming mode
    ------------------------------------------------------------------------

    Key: CAMEL-4595
    URL: https://issues.apache.org/jira/browse/CAMEL-4595
    Project: Camel
    Issue Type: New Feature
    Components: camel-core
    Reporter: Claus Ibsen
    Assignee: Claus Ibsen
    Fix For: 2.9.0


    Using XPath to split big XML files is not optimal as the JDK XPath framework doesn't support streaming mode yet. It may come in the future.
    So instead we can introduce a tokenizer which can grab the xml content between start/end tokens instead. Then we can parse big files with very low memory footprint.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Daniel Kulp (Commented) (JIRA) at Oct 31, 2011 at 2:10 am
    [ https://issues.apache.org/jira/browse/CAMEL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139861#comment-13139861 ]

    Daniel Kulp commented on CAMEL-4595:
    ------------------------------------


    Nux is a BSD license. That is compatible. That said, no updates since 2006 is a big concern.
    Add tokenize by pair to be able to split big XML files in streaming mode
    ------------------------------------------------------------------------

    Key: CAMEL-4595
    URL: https://issues.apache.org/jira/browse/CAMEL-4595
    Project: Camel
    Issue Type: New Feature
    Components: camel-core
    Reporter: Claus Ibsen
    Assignee: Claus Ibsen
    Fix For: 2.9.0


    Using XPath to split big XML files is not optimal as the JDK XPath framework doesn't support streaming mode yet. It may come in the future.
    So instead we can introduce a tokenizer which can grab the xml content between start/end tokens instead. Then we can parse big files with very low memory footprint.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriescamel
postedOct 29, '11 at 12:41p
activeOct 31, '11 at 2:10a
posts6
users1
websitecamel.apache.org

1 user in discussion

Daniel Kulp (Commented) (JIRA): 6 posts

People

Translate

site design / logo © 2022 Grokbase