FAQ
Hi all,

I'm trying to extract some information from an html file using
beautiful soup. The strings I want get are after br tags, eg:

<font size='6'>
<br>this info
<br>more info
<br>and more info
</font>

I can navigate to the first br tag using find_next_sibling, but how do
I get the string after the br's?
br.contents is empty.

thanks for any ideas.

Search Discussions

  • Erik Max Francis at Mar 10, 2006 at 11:39 pm

    meyerkp at gmail.com wrote:

    I'm trying to extract some information from an html file using
    beautiful soup. The strings I want get are after br tags, eg:

    <font size='6'>
    <br>this info
    <br>more info
    <br>and more info
    </font>

    I can navigate to the first br tag using find_next_sibling, but how do
    I get the string after the br's?
    br.contents is empty.
    I'm not familiar with Beautiful Soup specifically, but this isn't how
    the <br> tag works. Unlike a tag like <li> or <p>, which need not be
    closed in HTML, <br> does not contain anything, it's just a line break.
    If it were XHTML, it would be <br />, indicating that it's a
    standalone tag.

    Instead you want to traverse the contents of the font tag, taking into
    account line breaks that you encounter.

    --
    Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
    San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
    Fear is an emotion indispensible for survival.
    -- Hannah Arendt
  • Enigma Curry at Mar 11, 2006 at 4:28 am
    Here's how I print each line after the <br>'s:

    import BeautifulSoup as Soup
    page=open("test.html").read()
    soup=Soup.BeautifulSoup(page)
    for br in soup.fetch('br'):
    print br.next

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMar 10, '06 at 11:22p
activeMar 11, '06 at 4:28a
posts3
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase