FAQ
Hello,

I'm pleased to announce pyxser-0.2r, a Python-Object to XML
serializer and deserializer. This package it's completly
written in C and licensed under LGPLv3.

The tested Python versions are 2.5.X and 2.6.X.

* home page:
http://coder.cl/software/pyxser

* hosted at:
https://sourceforge.net/projects/pyxser/

The current ChangeLog is as follows:

0.2r (2009.04.18):

Daniel Molina Wegener <dmw at coder.cl>
* Removed memory leaks concerning libxml2 usage.
* Removed memory leaks concerning Python C/API usage.
* Improved fault detection.
* Improved pointer checking deallocations.
* Added unit test test-utf8-prof.py with 50.000 cilces ;)
* Every serilization is made into unicode objects.

Feel free to use and test. You can modify test-utf8-prof.py
to use more than 50.000 cicles depending on your libxml2
version.

Best regards,
- --
.O. | Daniel Molina Wegener | FreeBSD & Linux
..O | dmw [at] coder [dot] cl | Open Standards
OOO | http://coder.cl/ | FOSS Developer

Search Discussions

  • Stefan Behnel at Apr 19, 2009 at 6:25 am

    Daniel Molina Wegener wrote:
    * Every serilization is made into unicode objects.
    Hmm, does that mean that when I serialise, I get a unicode object back?
    What about the XML declaration? How can a user create well-formed XML from
    your output? Or is that not the intention?

    Stefan
  • Daniel Molina Wegener at Apr 19, 2009 at 4:25 pm
    Stefan Behnel <stefan_ml at behnel.de>
    on Sunday 19 April 2009 02:25
    wrote in comp.lang.python:

    Daniel Molina Wegener wrote:
    * Every serilization is made into unicode objects.
    Hmm, does that mean that when I serialise, I get a unicode object back?
    What about the XML declaration? How can a user create well-formed XML from
    your output? Or is that not the intention?
    Yes, if you serialize an object you get an XML string as
    unicode object, since unicode objects supports UTF-8 and
    some other encodings. Also you can deserialize the object ---
    I mean convert the XML back to python object tree. Take a
    look on the serializer output:

    http://coder.cl/software/pyxser/#id_example
    Stefan

    Best regards,
    - --
    .O. | Daniel Molina Wegener | FreeBSD & Linux
    ..O | dmw [at] coder [dot] cl | Open Standards
    OOO | http://coder.cl/ | FOSS Developer
  • Stefan Behnel at Apr 19, 2009 at 7:08 pm

    Daniel Molina Wegener wrote:
    Stefan Behnel <stefan_ml at behnel.de>
    on Sunday 19 April 2009 02:25
    wrote in comp.lang.python:

    Daniel Molina Wegener wrote:
    * Every serilization is made into unicode objects.
    Hmm, does that mean that when I serialise, I get a unicode object back?
    What about the XML declaration? How can a user create well-formed XML from
    your output? Or is that not the intention?
    Yes, if you serialize an object you get an XML string as
    unicode object, since unicode objects supports UTF-8 and
    some other encodings.
    That's not what I meant. I was wondering why you chose to use a unicode
    string instead of a byte string (which XML is defined for). If your only
    intention is to deserialise the unicode string into a tree, that may be
    acceptable. However, as soon as you start writing the data to a file or
    through a network pipe, or pass it to an XML parser, you'd better make it
    well-formed XML. So you either need to encode it as UTF-8 (for which you do
    not need a declaration), or you will need to encode it in a different byte
    encoding, and then prepend a declaration yourself. In any case, this is a
    lot more overhead (and cumbersome for users) than writing out a correctly
    serialised byte string directly.

    You seemed to be very interested in good performance, so I don't quite
    understand why you want to require an additional step with a relatively
    high performance impact that only makes it harder for users to use the tool
    correctly.

    Stefan
  • Daniel Molina Wegener at Apr 20, 2009 at 2:52 am
    Stefan Behnel <stefan_ml at behnel.de>
    on Sunday 19 April 2009 15:08
    wrote in comp.lang.python:

    Daniel Molina Wegener wrote:
    Stefan Behnel <stefan_ml at behnel.de>
    on Sunday 19 April 2009 02:25
    wrote in comp.lang.python:

    Daniel Molina Wegener wrote:
    * Every serilization is made into unicode objects.
    Hmm, does that mean that when I serialise, I get a unicode object back?
    What about the XML declaration? How can a user create well-formed XML
    from your output? Or is that not the intention?
    Yes, if you serialize an object you get an XML string as
    unicode object, since unicode objects supports UTF-8 and
    some other encodings.
    That's not what I meant. I was wondering why you chose to use a unicode
    string instead of a byte string (which XML is defined for). If your only
    intention is to deserialise the unicode string into a tree, that may be
    acceptable.
    Since libxml2 default encoding is UTF-8, and most applications are using
    XML encoded in UTF-8, it's clear to define it as the default encoding for
    the generated XML. Also, if take a little bit of time and read the
    documentation, you can use any encoding supported by Python, such as
    latin1, aka iso-8859-1. UTF-8 it's just the default encoding.

    The first intention was to have an C14N representation of python objects,
    and regarding the C14N specification, I can't use another encoding for C14N
    representation.
    However, as soon as you start writing the data to a file or
    through a network pipe, or pass it to an XML parser, you'd better make it
    well-formed XML. So you either need to encode it as UTF-8 (for which you
    do not need a declaration),
    I repeat, it's just the default encoding. But do you which exception do
    you get with byte strings and wrong encoded strings (think on accents and
    special characters)?, Unicode objects in python support most of regular
    encodings.
    or you will need to encode it in a different
    byte encoding, and then prepend a declaration yourself. In any case, this
    is a lot more overhead (and cumbersome for users) than writing out a
    correctly serialised byte string directly.
    No, I'm just using the default encoding for libxml2 which can be converted
    or reencoded to other character sets, and if read the documentation, you
    will see that you can use most of python supported encodings.
    You seemed to be very interested in good performance, so I don't quite
    understand why you want to require an additional step with a relatively
    high performance impact that only makes it harder for users to use the
    tool correctly.
    By using a different encoding than the default encoding for libxml2 makes
    the work hard for libxml2 since it requires that every #PCDATA section to be
    reencoded to the desired encoding and comparing one string conversion in
    python against many string conversion under libxml2, the program gets more
    slow performance by using a different encoding than the default encoding.
    Also, since it is the default encoding, using an UTF-8 string in python
    by passing the UTF-8 string buffer and size does not have a huge impact
    on performance.
    Stefan
    Best regards,
    - --
    .O. | Daniel Molina Wegener | FreeBSD & Linux
    ..O | dmw [at] coder [dot] cl | Open Standards
    OOO | http://coder.cl/ | FOSS Developer
  • Stefan Behnel at Apr 20, 2009 at 5:25 am

    Daniel Molina Wegener wrote:
    By using a different encoding than the default encoding for libxml2 makes
    the work hard for libxml2 since it requires that every #PCDATA section to be
    reencoded to the desired encoding and comparing one string conversion in
    python against many string conversion under libxml2, the program gets more
    slow performance by using a different encoding than the default encoding.
    It's not that much slower, though.

    http://codespeak.net/lxml/performance.html#parsing-and-serialising

    Stefan
  • Daniel Molina Wegener at Apr 20, 2009 at 3:41 am
    Stefan Behnel <stefan_ml at behnel.de>
    on Sunday 19 April 2009 15:08
    wrote in comp.lang.python:

    Daniel Molina Wegener wrote:
    Stefan Behnel <stefan_ml at behnel.de>
    on Sunday 19 April 2009 02:25
    wrote in comp.lang.python:

    Daniel Molina Wegener wrote:
    * Every serilization is made into unicode objects.
    Hmm, does that mean that when I serialise, I get a unicode object back?
    What about the XML declaration? How can a user create well-formed XML
    from your output? Or is that not the intention?
    Yes, if you serialize an object you get an XML string as
    unicode object, since unicode objects supports UTF-8 and
    some other encodings.
    That's not what I meant. I was wondering why you chose to use a unicode
    string instead of a byte string (which XML is defined for). If your only
    intention is to deserialise the unicode string into a tree, that may be
    acceptable. However, as soon as you start writing the data to a file or
    through a network pipe, or pass it to an XML parser, you'd better make it
    well-formed XML. So you either need to encode it as UTF-8 (for which you
    do not need a declaration), or you will need to encode it in a different
    byte encoding, and then prepend a declaration yourself. In any case, this
    is a lot more overhead (and cumbersome for users) than writing out a
    correctly serialised byte string directly.
    Sorry, it appears that I've misunderstand your question. By /unicode
    objects/ I mean /python unicode objects/ aka /python unicode strings/.
    Most of them can be reencoded into /latin*/ strings and then /ascii/
    strings if is that what you want. But for most communications, suchs as
    Java systems, utf-8 encoding goes as default. I've made pyxser to
    generate interoperability between python and other systems.
    You seemed to be very interested in good performance, so I don't quite
    understand why you want to require an additional step with a relatively
    high performance impact that only makes it harder for users to use the
    tool correctly.

    Stefan
    Atte.
    - --
    .O. | Daniel Molina Wegener | FreeBSD & Linux
    ..O | dmw [at] coder [dot] cl | Open Standards
    OOO | http://coder.cl/ | FOSS Developer
  • Stefan Behnel at Apr 20, 2009 at 5:24 am

    Daniel Molina Wegener wrote:
    Sorry, it appears that I've misunderstand your question. By /unicode
    objects/ I mean /python unicode objects/ aka /python unicode strings/.
    Yes, that's exactly what I'm talking about. Maybe you should read up on
    what Unicode is.

    Most of them can be reencoded into /latin*/ strings and then /ascii/
    strings if is that what you want. But for most communications, suchs as
    Java systems, utf-8 encoding goes as default.
    Well, then do not output a Python unicode string, but a UTF-8 encoded byte
    string as the default. Except for a couple of cases, Python unicode strings
    are very inconvenient for serialised XML.

    Stefan
  • Daniel Molina Wegener at Apr 20, 2009 at 4:30 pm

    Stefan Behnel wrote:

    Daniel Molina Wegener wrote:
    Sorry, it appears that I've misunderstand your question. By /unicode
    objects/ I mean /python unicode objects/ aka /python unicode strings/.
    Yes, that's exactly what I'm talking about. Maybe you should read up on
    what Unicode is.
    OK, seems that the better option is to return both types in different
    functions, then it will allow the user to choice to fit the development
    needs.
    Most of them can be reencoded into /latin*/ strings and then /ascii/
    strings if is that what you want. But for most communications, suchs as
    Java systems, utf-8 encoding goes as default.
    Well, then do not output a Python unicode string, but a UTF-8 encoded byte
    string as the default. Except for a couple of cases, Python unicode
    strings are very inconvenient for serialised XML.
    OK, good point, I must take a look on the implementation, and as I've
    said, I will implement both returns in different functions to allow a user
    choice, and document the impact of using python unicode strings.

    Thanks for your feedback :D
    Stefan
    Best regards,
    - --
    .O.| Daniel Molina Wegener | C/C++ Developer
    ..O| dmw [at] coder [dot] cl | FreeBSD & Linux
    OOO| http://coder.cl/ | Standards Basis

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedApr 19, '09 at 6:05a
activeApr 20, '09 at 4:30p
posts9
users2
websitepython.org

People

Translate

site design / logo © 2022 Grokbase