FAQ
Using Python 2.1, and pyXML 0.7.1, I'm having some difficulty with xml
documents that use DTD. What I want to do is parse an xml document that has
a doctype declaration specifying the DTD and validate it. I then need to
manipulate the document a bit (keeping it valid) and spit it back out to a
file. If I parse using validation, the validation takes place, but the
resulting document contains an empty root node. I get the whole document if
I parse without validation, but then the doctype declaration doesn't contain
a systemId when streamed out after manipulation. Shouldn't parsing with and
without validation return the same document object (assuming it's valid to
begin with)? And shouldn't the non-validating parser maintain the doctype
declaration in the resulting document instance (even if it's not used by the
parser to validate the xml)?

Chris


-- t1.xml ---
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE a SYSTEM "t1.dtd">
<a>
<b>simple test</b>
</a>

-- t1.dtd --
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT a (b)>
<!ELEMENT b (#PCDATA)>

-- test code --
import xml.dom.ext.reader.Sax2 as Sax2
ValReader = Sax2.Reader(validate=1)
NonValReader = Sax2.Reader(validate=0)
vd = ValReader.fromStream(open('t1.xml'))
nvd = NonValReader.fromStream(open('t1.xml'))
from xml.dom.ext import PrettyPrint as PPrint

PPrint(vd) # this shows vd to have an empty root
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE a SYSTEM "t1.dtd">
<a/>
PPrint(nvd) # this shows nvd to have a non-valid doctype declaration
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE a>
<a>
<b>simple test</b>
</a>

Search Discussions

  • Martin v. Loewis at May 30, 2002 at 7:19 am

    "Chris Prinos" <cprinos at foliage.com> writes:

    If I parse using validation, the validation takes place, but the
    resulting document contains an empty root node.
    That looks like a bug in xmlproc; please report that to
    sf.net/projects/pyxml.
    Shouldn't parsing with and without validation return the same
    document object (assuming it's valid to begin with)?
    Not necessarily. In the specific case, the XML parsers used (xmlproc
    and expat) have completely different code bases, so the behaviour is
    easily different, as well.
    And shouldn't the non-validating parser maintain the doctype
    declaration in the resulting document instance (even if it's not
    used by the parser to validate the xml)?
    Perhaps, but it just so happens that the underlying parser does not
    report the doctype, so PyXML cannot record it. You may want to report
    this as a bug for sf.net/projects/expat.

    Regards,
    Martin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMay 30, '02 at 1:16a
activeMay 30, '02 at 7:19a
posts2
users2
websitepython.org

2 users in discussion

Chris Prinos: 1 post Martin v. Loewis: 1 post

People

Translate

site design / logo © 2022 Grokbase