FAQ
Hi,

I glad to present a new xml parser.
node-xml-lite is a pure javascript ANSI/Unicode SAX XML parser.
It only depends on iconv-lite that is only loaded if necessary.

https://github.com/hgourvest/node-xml-lite

--
Henri Gourvest

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Search Discussions

  • Dhruvbird at Jul 9, 2012 at 5:38 am
    Why don't you make it compatible with the node-expat interface. It's sure
    to drive up adoption. Plus since this is pure javascript, it can be used
    from within a browser.

    Also, how is this different from node-xml? -
    https://github.com/robrighter/node-xml
    On Sunday, July 8, 2012 6:48:14 AM UTC-7, Henri Gourvest wrote:

    Hi,

    I glad to present a new xml parser.
    node-xml-lite is a pure javascript ANSI/Unicode SAX XML parser.
    It only depends on iconv-lite that is only loaded if necessary.

    https://github.com/hgourvest/node-xml-lite

    --
    Henri Gourvest
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 9, 2012 at 7:32 am

    Le 09/07/2012 07:38, dhruvbird a écrit :
    Why don't you make it compatible with the node-expat interface. It's
    sure to drive up adoption.
    I follow my own xpath ;)
    Plus since this is pure javascript, it can be
    used from within a browser.
    No, it is only for Node.Js, I need to use the Buffer & File System classes.
    Also, how is this different from node-xml? -
    https://github.com/robrighter/node-xml
    Mine is an ANSI parser, I need an ANSI parser for my own projects, this
    why I dit it.

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 9, 2012 at 9:52 am

    Le 09/07/2012 07:38, dhruvbird a écrit :
    Also, how is this different from node-xml? -
    https://github.com/robrighter/node-xml
    https://github.com/robrighter/node-xml/issues/18

    node-xml have to read the whole file before decoding, it is a problem
    that can't be solved.

    node-xml-lite does not have this problem because it is an ANSI parser.
    node-xml expect the xml file is utf8, so if it does not decode the
    entire file, characters could be lost.

    I can see you are interested with xmpp, you could use node-xml-lite to
    decode xmpp stream, but you will can't use node-xml because of this.


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Dhruvbird at Jul 10, 2012 at 6:02 am

    On Monday, July 9, 2012 2:52:31 AM UTC-7, Henri Gourvest wrote:
    Le 09/07/2012 07:38, dhruvbird a �crit :
    Also, how is this different from node-xml? -
    https://github.com/robrighter/node-xml
    https://github.com/robrighter/node-xml/issues/18

    node-xml have to read the whole file before decoding, it is a problem
    that can't be solved.
    Interesting. I wasn't aware of this drawback - always thought that it had a
    streaming interface.


    node-xml-lite does not have this problem because it is an ANSI parser.
    What do you mean by "ANSI parser"?


    node-xml expect the xml file is utf8, so if it does not decode the
    entire file, characters could be lost.
    That's okay, cause I guess it can be hacked to support a different encoding.

    I can see you are interested with xmpp, you could use node-xml-lite to
    decode xmpp stream, but you will can't use node-xml because of this.
    True. I'm mostly interested because of xmpp stanza parsing. Again, having a
    node-expat compatible interface would be useful (imho).

    Thanks!

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 10, 2012 at 7:42 am

    Le 10/07/2012 08:02, dhruvbird a écrit :
    What do you mean by "ANSI parser"?
    ANSI = 1 byte/character
    Unicode in Js = 2 byte/character (UCS2)
    It is an ANSI parser because it can parse inputs from a Buffer that is
    an array of bytes.
    Most of times XML is encoded in UTF8.
    UTF8 is a special ANSI code page where all unicode characters can be
    encoded. In UTF8 characters can be encoded on one or more bytes.

    If you want to decode a xml document by chunks, you can cut a character,
    if the partial chunk is decoded from UTF8 to Unicode to be parsed by
    node-xml there will be a character lost, ouch!

    This can't happen with an ANSI parser.

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ryan Schmidt at Jul 10, 2012 at 7:54 am

    On Jul 10, 2012, at 02:42, Henri Gourvest wrote:

    Le 10/07/2012 08:02, dhruvbird a écrit :
    What do you mean by "ANSI parser"?
    ANSI = 1 byte/character
    A curious and highly personal definition to be sure.


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 10, 2012 at 11:26 am

    Le 10/07/12 09:54, Ryan Schmidt a écrit :
    A curious and highly personal definition to be sure.
    what name should I use instead ?
    ASCII ?

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Dhruvbird at Jul 10, 2012 at 5:04 pm
    Why not utf-8? I mean javascript string are unicode by default, so what's
    wrong with supporting international characters (this is probably very
    important).
    On Tuesday, July 10, 2012 4:25:49 AM UTC-7, Henri Gourvest wrote:

    Le 10/07/12 09:54, Ryan Schmidt a �crit :
    A curious and highly personal definition to be sure.
    what name should I use instead ?
    ASCII ?
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 11, 2012 at 7:04 am

    Le 10/07/2012 19:04, dhruvbird a écrit :
    Why not utf-8
    It support UTF-8, I don't know why you think it doesn't do


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Ryan Schmidt at Jul 11, 2012 at 9:48 am

    On Jul 11, 2012, at 02:04, Henri Gourvest wrote:

    Le 10/07/2012 19:04, dhruvbird a écrit :
    Why not utf-8
    It support UTF-8, I don't know why you think it doesn't do
    You are currently calling node-xml-lite an "XML ANSI/Unicode SAX parser".


    The "Unicode" portion of that description is redundant since by definition all XML documents are composed of characters from the Unicode character set:

    http://www.w3.org/TR/REC-xml/#charsets

    There are numerous character encodings that can represent characters from the Unicode character set. All XML parsers must support UTF-8 and UTF-16, and they may support others as well. The XML specification provides guidance on how a parser can detect which encoding is being used:

    http://www.w3.org/TR/REC-xml/#sec-guessing


    "ANSI" does not refer to a particular character encoding. It might refer to the various Windows code pages:

    http://en.wikipedia.org/wiki/Windows_code_page#ANSI_code_page

    By listing "ANSI" in the description, are you trying to say that you explicitly support all of those character encodings? Or you might be referring specifically to the Windows-1252 character encoding only. If it is your intention to indicate what character encodings your parser supports, then to reduce confusion, it might be better to list those character encodings by their most common unambiguous names.

    The Windows code pages are not ISO or ANSI standards, but they are supersets of ISO standards. For example, Windows-1252 is a superset of ISO-8859-1. If you're going to claim compatibility with Windows-1252, then you're probably also compatible with ISO-8859-1.


    "Why not utf-8" was not asking why UTF-8 isn't supported; it goes without saying that it is. Rather, it was in response to your question whether you should change "ANSI" to "ASCII" in your module's description:
    On Jul 10, 2012, at 06:25, Henri Gourvest wrote:

    what name should I use instead ?
    ASCII ?
    "ASCII" of course refers to the 7-bit character encoding of which many other character encodings (including the Windows code pages and the UTF encodings) are a superset:

    http://en.wikipedia.org/wiki/ASCII

    Thus mentioning that you support ASCII is redundant, since by specification you are required to support UTF-8, and UTF-8 includes all of ASCII.


    ~ ~ ~


    In the end, it comes down to what you wrote earlier:
    On Jul 10, 2012, at 02:42, Henri Gourvest wrote:

    Le 10/07/2012 08:02, dhruvbird a écrit :
    What do you mean by "ANSI parser"?
    ANSI = 1 byte/character
    Unicode in Js = 2 byte/character (UCS2)
    It is an ANSI parser because it can parse inputs from a Buffer that is an array of bytes.
    Most of times XML is encoded in UTF8.
    UTF8 is a special ANSI code page where all unicode characters can be encoded. In UTF8 characters can be encoded on one or more bytes.

    Based on this, I think you don't mean "ANSI" at all. The American National Standards Institute had nothing to do with Unicode or UTF-8. The Unicode standard is ISO 10646, by the International Standards Organization. UTF-8 was created by the Network Working Group, and is defined in an RFC (Request For Comments) document, RFC 3629.

    I think you're just trying to convey the idea that, in the event of multibyte characters, the buffer you're using might contain partial / incomplete / fractional characters, and that your parser is able to accommodate that situation, presumably by waiting to handle them until the rest of the bytes that make up the character have arrived in the buffer. That's great, and I'd just suggest that you explain it as such at the beginning of your read me, and discard the "Unicode", "ANSI" or "ASCII" labels, since that's not what they convey. The first paragraph of your read me should be written to give users an understanding of what your module is for and to clearly differentiate it from the alternatives.



    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 11, 2012 at 10:08 am
    I'd just suggest that you explain it as such at the beginning of your read me, and discard the "Unicode", "ANSI" or "ASCII" labels, since that's not what they convey. The first paragraph of your read me should be written to give users an understanding of what your module is for and to clearly differentiate it from the alternatives.
    Thank you for the explanations, I will write a better description of the
    project.

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Dhruvbird at Jul 10, 2012 at 5:10 pm

    On Tuesday, July 10, 2012 12:42:35 AM UTC-7, Henri Gourvest wrote:

    If you want to decode a xml document by chunks, you can cut a character,
    if the partial chunk is decoded from UTF8 to Unicode to be parsed by
    node-xml there will be a character lost, ouch!
    Sorry, but I didn't understand the paragraph above. Please could you
    elaborate what you mean by it?

    Are you trying to say that if the Buffer() contains half a utf8 character,
    then the output stream will print just half the character? IIRC, utf8 is a
    self synchronizing stream, so as long as you are checking each byte to be
    something that has the high-bit reset, you should be generally okay. Does
    anyone see any problem with this?

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Alan Gutierrez at Jul 11, 2012 at 12:36 am

    On Tue, Jul 10, 2012 at 10:09:57AM -0700, dhruvbird wrote:
    On Tuesday, July 10, 2012 12:42:35 AM UTC-7, Henri Gourvest wrote:


    If you want to decode a xml document by chunks, you can cut a character,
    if the partial chunk is decoded from UTF8 to Unicode to be parsed by
    node-xml there will be a character lost, ouch!
    Sorry, but I didn't understand the paragraph above. Please could you
    elaborate what you mean by it?

    Are you trying to say that if the Buffer() contains half a utf8 character,
    then the output stream will print just half the character? IIRC, utf8 is a
    self synchronizing stream, so as long as you are checking each byte to be
    something that has the high-bit reset, you should be generally okay. Does
    anyone see any problem with this?
    I believe you need to look at the end of the buffer to ensure that the
    high-bit is not set. If it is, then you need to move back to a byte the
    high-bit and that is the last character in the string, or else the top
    two are set and the previous byte is last byte in the string. Then you
    carry over any bytes you've not used to the next buffer.

    --
    Alan Gutierrez - http://twitter.com/bigeasy - http://github.com/bigeasy

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Angelo Chen at Jul 18, 2012 at 3:34 pm
    Looks good, is it a pure js? Can it support xpath? A sample ?
    On Jul 8, 9:48 pm, Henri Gourvest wrote:
    Hi,

    I glad to present a newxmlparser.
    node-xml-lite is a pure javascript ANSI/Unicode SAXXMLparser.
    It only depends on iconv-lite that is only loaded if necessary.

    https://github.com/hgourvest/node-xml-lite

    --
    Henri Gourvest
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 18, 2012 at 3:40 pm

    Le 18/07/2012 17:34, Angelo Chen a écrit :
    Looks good, is it a pure js?
    Yes it is 100% pure js.

    Can it support xpath?
    No

    A sample ?
    see readme:
    https://github.com/hgourvest/node-xml-lite/blob/master/README.md



    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Angelo Chen at Jul 18, 2012 at 3:56 pm
    How to extract a element from a XML string?
    Thanks
    On Jul 18, 11:39 pm, Henri Gourvest wrote:
    Le 18/07/2012 17:34, Angelo Chen a crit :> Looks good, is it a pure js?

    Yes it is 100% pure js.

    Can it support xpath?
    No

    A sample ?
    see readme:https://github.com/hgourvest/node-xml-lite/blob/master/README.md
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 18, 2012 at 7:24 pm
    Le 18/07/12 17:56, Angelo Chen a écrit :
    How to extract a element from a XML string?
    non SAX functions return objects like this

    {
    name: "root",
    attrib: {...}
    childs: [...]
    }



    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Dhruvbird at Jul 19, 2012 at 6:43 am
    Is it possible to feed this parser an incomplete XML stream and then pass
    it the rest of the stream? Something like a sax parser? I'm asking since
    all the examples show that you are passing a complete xml document to the
    parser every time.
    On Wednesday, July 18, 2012 12:24:30 PM UTC-7, Henri Gourvest wrote:

    Le 18/07/12 17:56, Angelo Chen a �crit :
    How to extract a element from a XML string?
    non SAX functions return objects like this

    {
    name: "root",
    attrib: {...}
    childs: [...]
    }


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Henri Gourvest at Jul 19, 2012 at 9:05 am
    The files are always read by chunks in a buffer and parsed.
    The last sample "providing your own data to SAX parser" show you how to to
    that.
    But I think I should provide a function that takes as parameter a callback
    function.

    Le jeudi 19 juillet 2012 08:43:01 UTC+2, dhruvbird a écrit :
    Is it possible to feed this parser an incomplete XML stream and then pass
    it the rest of the stream? Something like a sax parser? I'm asking since
    all the examples show that you are passing a complete xml document to the
    parser every time.
    On Wednesday, July 18, 2012 12:24:30 PM UTC-7, Henri Gourvest wrote:

    Le 18/07/12 17:56, Angelo Chen a �crit :
    How to extract a element from a XML string?
    non SAX functions return objects like this

    {
    name: "root",
    attrib: {...}
    childs: [...]
    }


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupnodejs @
categoriesnodejs
postedJul 8, '12 at 1:48p
activeJul 19, '12 at 9:05a
posts20
users5
websitenodejs.org
irc#node.js

People

Translate

site design / logo © 2022 Grokbase