Do you just want to ignore them and store all in one field? If you know
the used tags previously, I guess you could set up a stop words list
with them. If not, you could do an "XMLAnalyzer" that simply ignores
everything inside '<>'...

If you want to split the xml content in separate fields, you have to
parse it before indexing, take a look at this article:

I'm a little bit new to Lucene, so I might be missing something here,
but I wouldn't expect it to have an API for this...

Kalani Ruwanpathirana escreveu:
Hi all,

I am searching for a way to ignore XML tags in the input when indexing. Is
there a built in functionality in Lucene to get this done?
I am sorry if this was discussed before. I searched but couldn't find a
clear solution.

Thanks in advance

*Marcelo Frantz Schneider*
/SIC - TCO - Tecnologia em Engenharia do Conhecimento/

*E-mail:* marcelo.schneider@digitro.com.br

***Site:* www.digitro.com <http://www.digitro.com>

Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
acredita-se estar livre de perigo.

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupjava-user @
postedJul 24, '08 at 6:18a
activeJul 25, '08 at 12:48p



site design / logo © 2022 Grokbase