On Jul 11, 2012, at 02:04, Henri Gourvest wrote:
Le 10/07/2012 19:04, dhruvbird a écrit :
Why not utf-8
It support UTF-8, I don't know why you think it doesn't do
You are currently calling node-xml-lite an "XML ANSI/Unicode SAX parser".
The "Unicode" portion of that description is redundant since by definition all XML documents are composed of characters from the Unicode character set:http://www.w3.org/TR/REC-xml/#charsets
There are numerous character encodings that can represent characters from the Unicode character set. All XML parsers must support UTF-8 and UTF-16, and they may support others as well. The XML specification provides guidance on how a parser can detect which encoding is being used:http://www.w3.org/TR/REC-xml/#sec-guessing
"ANSI" does not refer to a particular character encoding. It might refer to the various Windows code pages:http://en.wikipedia.org/wiki/Windows_code_page#ANSI_code_page
By listing "ANSI" in the description, are you trying to say that you explicitly support all of those character encodings? Or you might be referring specifically to the Windows-1252 character encoding only. If it is your intention to indicate what character encodings your parser supports, then to reduce confusion, it might be better to list those character encodings by their most common unambiguous names.
The Windows code pages are not ISO or ANSI standards, but they are supersets of ISO standards. For example, Windows-1252 is a superset of ISO-8859-1. If you're going to claim compatibility with Windows-1252, then you're probably also compatible with ISO-8859-1.
"Why not utf-8" was not asking why UTF-8 isn't supported; it goes without saying that it is. Rather, it was in response to your question whether you should change "ANSI" to "ASCII" in your module's description:
On Jul 10, 2012, at 06:25, Henri Gourvest wrote:
what name should I use instead ?
"ASCII" of course refers to the 7-bit character encoding of which many other character encodings (including the Windows code pages and the UTF encodings) are a superset:http://en.wikipedia.org/wiki/ASCII
Thus mentioning that you support ASCII is redundant, since by specification you are required to support UTF-8, and UTF-8 includes all of ASCII.
~ ~ ~
In the end, it comes down to what you wrote earlier:
On Jul 10, 2012, at 02:42, Henri Gourvest wrote:
Le 10/07/2012 08:02, dhruvbird a écrit :
What do you mean by "ANSI parser"?
ANSI = 1 byte/character
Unicode in Js = 2 byte/character (UCS2)
It is an ANSI parser because it can parse inputs from a Buffer that is an array of bytes.
Most of times XML is encoded in UTF8.
UTF8 is a special ANSI code page where all unicode characters can be encoded. In UTF8 characters can be encoded on one or more bytes.
Based on this, I think you don't mean "ANSI" at all. The American National Standards Institute had nothing to do with Unicode or UTF-8. The Unicode standard is ISO 10646, by the International Standards Organization. UTF-8 was created by the Network Working Group, and is defined in an RFC (Request For Comments) document, RFC 3629.
I think you're just trying to convey the idea that, in the event of multibyte characters, the buffer you're using might contain partial / incomplete / fractional characters, and that your parser is able to accommodate that situation, presumably by waiting to handle them until the rest of the bytes that make up the character have arrived in the buffer. That's great, and I'd just suggest that you explain it as such at the beginning of your read me, and discard the "Unicode", "ANSI" or "ASCII" labels, since that's not what they convey. The first paragraph of your read me should be written to give users an understanding of what your module is for and to clearly differentiate it from the alternatives.