FAQ
Hi,
I have some HTML that looks essentially consists of a series of <div>'s
and each <div> having one of two classes (tnt-question or tnt-answer).
I'm using HTMLParser to handle the tags as:

class MyHTMLParser(HTMLParser.HTMLParser):

def handle_starttag(self, tag, attrs):
if len(attrs) == 1:
cls,whichcls = attrs[0]
if whichcls == 'tnt-question':
print self.get_starttag_text(), self.getpos()
def handle_endtag(self, tag):
pass
def handle_data(self, data):
print data

if __name__ == '__main__':

htmldata = string.join(open('tt.html','r').readlines())
parser = MyHTMLParser()
parser.feed( htmldata )

However what I would like is that when the parser reaches some HTML like
this:

<div class="tnt-question">
How do I add a user to a MySQL system?
</div>

I should get back the data between the open and close tags. However the
above code prints the text contained between all tags, not just the <div>
tags with the class='tnt-question'.

Is there a way to call handle_data() when a specific tag is being handled?
Placing a call to handle_data() in handle_starttag seems to be the way -
but I';m not sure how to actually do it - what data should I pass to the
call?

Any pointers would be appreciated
Thanks,
Rajarshi

Search Discussions

  • Benjamin Niemann at Aug 19, 2004 at 3:51 pm

    Rajarshi Guha wrote:
    Hi,
    I have some HTML that looks essentially consists of a series of <div>'s
    and each <div> having one of two classes (tnt-question or tnt-answer).
    I'm using HTMLParser to handle the tags as:

    class MyHTMLParser(HTMLParser.HTMLParser):

    def handle_starttag(self, tag, attrs):
    if len(attrs) == 1:
    cls,whichcls = attrs[0]
    if whichcls == 'tnt-question':
    print self.get_starttag_text(), self.getpos()
    def handle_endtag(self, tag):
    pass
    def handle_data(self, data):
    print data

    if __name__ == '__main__':

    htmldata = string.join(open('tt.html','r').readlines())
    parser = MyHTMLParser()
    parser.feed( htmldata )

    However what I would like is that when the parser reaches some HTML like
    this:

    <div class="tnt-question">
    How do I add a user to a MySQL system?
    </div>

    I should get back the data between the open and close tags. However the
    above code prints the text contained between all tags, not just the <div>
    tags with the class='tnt-question'.

    Is there a way to call handle_data() when a specific tag is being handled?
    Placing a call to handle_data() in handle_starttag seems to be the way -
    but I';m not sure how to actually do it - what data should I pass to the
    call?
    Set a flag, when you the parser calls handle_starttag() and the tag
    matches your criteria, unset it, when the corresponding endtag is found
    (you'll probably have to count the nesting depth, so for
    <div class="printme">Yo <div>man</div>!</div>
    the flag is unset on the second </div>). Then in handle_data() only
    print it, when the flag is set.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 19, '04 at 3:27p
activeAug 19, '04 at 3:51p
posts2
users2
websitepython.org

2 users in discussion

Rajarshi Guha: 1 post Benjamin Niemann: 1 post

People

Translate

site design / logo © 2022 Grokbase