FAQ
Do you just want to ignore them and store all in one field? If you know
the used tags previously, I guess you could set up a stop words list
with them. If not, you could do an "XMLAnalyzer" that simply ignores
everything inside '<>'...

If you want to split the xml content in separate fields, you have to
parse it before indexing, take a look at this article:
http://www.ibm.com/developerworks/library/j-lucene/

I'm a little bit new to Lucene, so I might be missing something here,
but I wouldn't expect it to have an API for this...


Kalani Ruwanpathirana escreveu:
Hi all,

I am searching for a way to ignore XML tags in the input when indexing. Is
there a built in functionality in Lucene to get this done?
I am sorry if this was discussed before. I searched but couldn't find a
clear solution.

Thanks in advance
Kalani
--


*Marcelo Frantz Schneider*
/SIC - TCO - Tecnologia em Engenharia do Conhecimento/

*DÍGITRO TECNOLOGIA*
*E-mail:* marcelo.schneider@digitro.com.br

***Site:* www.digitro.com <http://www.digitro.com>

--
Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
acredita-se estar livre de perigo.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedJul 24, '08 at 6:18a
activeJul 25, '08 at 12:48p
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase