FAQ

On May 3, 2005, at 4:35 AM, Bartosch Warzecha wrote:
Hello,

I´m building a search engine for HTML-Dokuments, and I´ve got a
HTML-parsing
problem.

This documents are in german. In this documents are different special
characters, and different ways of writing this special characters,
like "ö",
"ö" and "&#246". Do somebody know a parsing engine that has no
problems
with all this different ways to write this special characters?
What HTML parser are you using? Those entity references should not
be seen by your code once resolved by a parser. Try NekoHTML.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedMay 3, '05 at 8:36a
activeMay 3, '05 at 4:13p
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase