Hi all,
I'm using the StandardAnalyzer in an application based on Lucene 1.4.2.
Currently, and by default, the StandardAnalyser "throws
semicolon-signs away" at index and store time. For example, a document
like "ee3e städer" looks liks "ee3e str" when
retrieved from the index (That is, the ;-sign is missing). The
document is stored as a Field.Text in the index.
What I would like to do is to index, and store, words like
"städer" and retrieve them in exactly the same form, i.e. as
"städer".
I can imagine that the result I would like to achieve can be produced
by some modifications to the StandardTokenizer.jj (or somewhere else).
Can someone please help me by showing me where/how such change can be
made.
(Note: It is not necessary to be able to search for text with
semicolon-sign included, just to retrieve them in their original
format.)
cheers
Clas / Frisim.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org