FAQ
Aloha,

i'm trying to write an xml filter, that extracts some info about
an .xml document (with external entities), esp. start elements and
external entities. The document is a DOCBOOK xml and afacs
well formed and passes our docbook toolchain (dblatex etc.).

My parser is (very simple):
[115] scylla(scylla)> more pbxml.py

class xmlhandle:
def __init__(self):
self.parser_stack = [];
self.parser = None;

def se(self,name,attr):
print "s", self.parser.CurrentLineNumber, name, attr

def ex(self,context,baseid,n1,n2):
print "x",context,n1,n2

def fromxml(fname):
import xml.parsers.expat
p = xml.parsers.expat.ParserCreate()
xl = xmlhandle()
p.StartElementHandler = xl.se
p.ExternalEntityRefHandler = xl.ex
xl.parser = p
p.ParseFile(file(fname))
return

if __name__ == "__main__":
import sys
fromxml(sys.argv[1])

my document (in 2 parts):

[116] scylla(scylla)> more s3.xml
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"/usr/share/xml/docbook/xml/4.2/docbookx.dtd"
[
<!ENTITY bookinfo SYSTEM "bookinfo.xml">
]>
<book>
&bookinfo;
<chapter id="technicalDescription"><title>technical description</title>
<para>
This chapter includes specification of the main simulation loop.
</para>
</chapter>
</book>

[118] scylla(scylla)> more bookinfo.xml
<bookinfo>
<title>BookTitle</title>
<authorgroup>
<author>
<firstname>A</firstname>
<surname>B</surname>
</author>
</authorgroup>
</bookinfo>

The run produces:

[120] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
s 10 para {}
Traceback (most recent call last):
File "pbxml.py", line 25, in ?
fromxml(sys.argv[1])
File "pbxml.py", line 20, in fromxml
p.ParseFile(file(fname))
TypeError: an integer is required

Anyone any idea where the error is produced?
Anyone any idea how to debug(? if it's really a bug or
missunderstanding of expate) this?

Hoping for an answer and wishing a happy day,
LOBI

Search Discussions

  • Lawrence D'Oliveiro at Aug 26, 2007 at 10:09 am

    In message <fak7q4$ahn$1 at daniel-new.mch.sbs.de>, Andreas Lobinger wrote:

    Anyone any idea where the error is produced?
    Do you want to try adding an EndElementHandler as well, just to get more
    information on where the error might be happening?
  • Andreas Lobinger at Aug 27, 2007 at 12:31 pm
    Aloha,

    Lawrence D'Oliveiro wrote:
    In message <fak7q4$ahn$1 at daniel-new.mch.sbs.de>, Andreas Lobinger wrote:
    Anyone any idea where the error is produced?
    Do you want to try adding an EndElementHandler as well, just to get more
    information on where the error might be happening?
    I want.

    Adding an EndElement (left as an exercise to the user) handler the
    output looks like this:
    [42] scylla(scylla)> python pbxml.py s3.xml
    s 7 book {}
    x bookinfo bookinfo.xml None
    s 9 chapter {u'id': u'technicalDescription'}
    s 9 title {}
    e title
    s 10 para {}
    e para
    e chapter
    e book
    Traceback (most recent call last):
    File "pbxml.py", line 29, in ?
    fromxml(sys.argv[1])
    File "pbxml.py", line 24, in fromxml
    p.ParseFile(file(fname))
    TypeError: an integer is required

    which shows me that the error is caused after parsing the /book ...
    BUT still within p.ParseFile (expat internal), so i can't look
    into it.

    The example here may be missleading. It was stripped down from
    a quite large docbook.xml and there ther error happened in the
    middle of the document, not at the end.

    Wishing a happy day,
    LOBI
  • Andreas Lobinger at Aug 28, 2007 at 12:36 pm
    Aloha,

    Andreas Lobinger wrote:
    Lawrence D'Oliveiro wrote:
    In message <fak7q4$ahn$1 at daniel-new.mch.sbs.de>, Andreas Lobinger wrote:
    Anyone any idea where the error is produced?
    ... to share my findings with you:

    def ex(self,context,baseid,n1,n2):
    print "x",context,n1,n2
    return 1

    The registered Handler has to return a (integer) value.
    Would have been nice if this had been mentioned in the documentation.

    Wishing a happy day,
    LOBI
  • Andreas Lobinger at Aug 28, 2007 at 12:37 pm
    Aloha,

    Andreas Lobinger wrote:
    Andreas Lobinger wrote:
    Lawrence D'Oliveiro wrote:
    In message <fak7q4$ahn$1 at daniel-new.mch.sbs.de>, Andreas Lobinger wrote:
    Anyone any idea where the error is produced?
    The registered Handler has to return a (integer) value.
    Would have been nice if this had been mentioned in the documentation.
    Delete last line, it is mentioned in the documentation.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 23, '07 at 3:06p
activeAug 28, '07 at 12:37p
posts5
users2
websitepython.org

People

Translate

site design / logo © 2022 Grokbase