FAQ
Hi all,

I'm using the StandardAnalyzer in an application based on Lucene 1.4.2.

Currently, and by default, the StandardAnalyser "throws
semicolon-signs away" at index and store time. For example, a document
like "ee3e städer" looks liks "ee3e st&#x00e4der" when
retrieved from the index (That is, the ;-sign is missing). The
document is stored as a Field.Text in the index.

What I would like to do is to index, and store, words like
"städer" and retrieve them in exactly the same form, i.e. as
"städer".

I can imagine that the result I would like to achieve can be produced
by some modifications to the StandardTokenizer.jj (or somewhere else).
Can someone please help me by showing me where/how such change can be
made.

(Note: It is not necessary to be able to search for text with
semicolon-sign included, just to retrieve them in their original
format.)

cheers
Clas / Frisim.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Erik Hatcher at Nov 26, 2004 at 12:35 am

    On Nov 25, 2004, at 6:05 PM, clas wrote:
    Currently, and by default, the StandardAnalyser "throws
    semicolon-signs away" at index and store time.
    This is incorrect. *Nothing* is changed in what is stored from the
    original String. Chances are something else is causing you to see the
    semicolon dropped.
    (Note: It is not necessary to be able to search for text with
    semicolon-sign included, just to retrieve them in their original
    format.)
    Since searching for the semicolon is not necessary, all should be well
    once you figure out where else in your system the semicolon is dropped.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Otis Gospodnetic at Nov 26, 2004 at 1:04 am
    You could use Field.Keyword.

    Otis


    --- clas wrote:
    Hi all,

    I'm using the StandardAnalyzer in an application based on Lucene
    1.4.2.

    Currently, and by default, the StandardAnalyser "throws
    semicolon-signs away" at index and store time. For example, a
    document
    like "ee3e städer" looks liks "ee3e str" when
    retrieved from the index (That is, the ;-sign is missing). The
    document is stored as a Field.Text in the index.

    What I would like to do is to index, and store, words like
    "städer" and retrieve them in exactly the same form, i.e. as
    "städer".

    I can imagine that the result I would like to achieve can be produced
    by some modifications to the StandardTokenizer.jj (or somewhere
    else).
    Can someone please help me by showing me where/how such change can be
    made.

    (Note: It is not necessary to be able to search for text with
    semicolon-sign included, just to retrieve them in their original
    format.)

    cheers
    Clas / Frisim.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 25, '04 at 11:05p
activeNov 26, '04 at 1:04a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase