FAQ
Hello,

i'm would be interested in parsing a HTML files by its corresponding
opening and closing tags but by taking into account the class
attributes and its values,

<html>
<body>
...
<div class="one">
...
<div class="two">
</div>
...
</div>
...
<div class="one">...</div>
<a href="..." class="three">
</body>
</html>

in this example, i will need all content inside div with class="two",
or only class="one",

so i wondering if i should go with regular expression, but i do not
think so as i must jumpt after inner closing div, or with a simple
parser, i've searched and found
http://www.diveintopython.org/html_processing/basehtmlprocessor.html
but i would like the parser not to change anything at all (no
lowercase).

can you help ?

best.

Search Discussions

  • Gatti at Feb 23, 2007 at 8:34 am

    On Feb 23, 8:54 am, lorean2... at yahoo.fr wrote:
    Hello,

    i'm would be interested in parsing a HTML files by its corresponding
    opening and closing tags but by taking into account the class
    attributes and its values, [...]
    so i wondering if i should go with regular expression, but i do not
    think so as i must jumpt after inner closing div, or with a simple
    parser, i've searched and foundhttp://www.diveintopython.org/html_processing/basehtmlprocessor.html
    but i would like the parser not to change anything at all (no
    lowercase).
    Horribly brittle idea. Use a robust HTML parser (e.g.
    http://www.crummy.com/software/BeautifulSoup/) to build a document
    tree, then visit it top down and look at the value of the 'class'
    attributes.

    Regards,
    Lorenzo Gatti

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedFeb 23, '07 at 7:54a
activeFeb 23, '07 at 8:34a
posts2
users2
websitepython.org

2 users in discussion

Gatti: 1 post Lorean2007: 1 post

People

Translate

site design / logo © 2022 Grokbase