FAQ

Luca Calderano wrote:
Hi guys...

I've done a subclass of SGMLParser
to handle the contents of a web page,
but i'm not able to handle the <BR> tag

can someone help me???

S.G.A S.p.A.
Nucleo Sistemi Informativi
Luca Calderano
I do not know SGMLParser.. but HTML is not SGML nor any subset. It is
some ill language which one even rarely finds "pure" (written in the way
the spec says it MUST be)

I believe SGML does not like none closing tags. BR is one of the many
none closing tags in HTML (also look at IMG or HR)

Depending on what you are doing you should maybe use XHTML as an input
if you can (XML well-formed HTML, XML being a subset of SGML) or you
should probably look for a completely different parser "technology".
Maybe HTMLParser will help you a little more.

Do not forget, random downloaded HTML from Internet is often broken.
You might rather want to use tidylib (corrects broken HTML code into
XHTML) and a XHTML/SGML parser or a DOM.

Hope it helps even though the effort I took to check my statements was
small :)

Regards,
Ben.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
grouppython-list @
categoriespython
postedJul 31, '03 at 1:10p
activeAug 5, '03 at 4:43p
posts6
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase