FAQ
I've had fairly good experience with Jtidy!

But HTMLParser http://htmlparser.sourceforge.net/
seems to have the lighter looking API. It is Event
based and I might need to parse some large HTML sometime
soon, where DOM might be the problem. Does anyone
have practical experience with HTMLParser?

Thanks
Frank
-----Ursprüngliche Nachricht-----
Von: petite_abeille
Gesendet: Dienstag, 25. Februar 2003 19:49
An: Lucene Users List
Betreff: Re: Best HTML Parser !!


On Monday, Feb 24, 2003, at 20:28 Europe/Zurich, Lukas Zapletal wrote:

I have some good experiences with JTidy. It works like
DOM-XML parser
and cleans HTML it by the way.
I use jtidy also. Both for parsing and clean-up. Works pretty nicely.
This is VERY useful, because EVERY HTML have at least ONE error.
This rule should be tattooed on every parsers head: out of the
laboratory, nothing is compliant. Which render the race to "more
compliance" among the different parsers somewhat ridiculous.

Cheers,

PA.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Cesar Estevez at Feb 26, 2003 at 8:51 pm
    Hello!
    I am using Lucene to build a searcher, using the Algorithm of Porter for the
    Galician language (Galicia, Spain). I need parser a PhraseQuery with my
    analyzer and wanted to know if this is possible.
    Thank you.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 26, '03 at 9:00a
activeFeb 26, '03 at 8:51p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase