FAQ
It is my understanding that the StandardAnalyzer will remove underscores
- so "some_word" be indexed as 'some' and 'word'.

I want to keep the underscores, so I was thinking of changing over to an
Analyzer that uses the WhiteSpaceTokenizer, LowerCaseFilter, and StopFilter.

What other tokenizing magic will I lose by changing away from the
StandardAnalyzer?

Thanks,

Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at Aug 8, 2005 at 3:01 pm

    On Aug 8, 2005, at 10:43 AM, Dan Armbrust wrote:
    It is my understanding that the StandardAnalyzer will remove
    underscores - so "some_word" be indexed as 'some' and 'word'.

    I want to keep the underscores, so I was thinking of changing over
    to an Analyzer that uses the WhiteSpaceTokenizer, LowerCaseFilter,
    and StopFilter.

    What other tokenizing magic will I lose by changing away from the
    StandardAnalyzer?
    The best thing you can do is set up a test environment to try out
    sample text with various analyzers. Lucene in Action's source code
    (http://www.lucenebook.com) comes with such a demo that you can
    easily tweak. Here's a sample of running "ant AnalyzerDemo":

    [echo] Running lia.analysis.AnalyzerDemo...
    [java] Analyzing "some_word"
    [java] WhitespaceAnalyzer:
    [java] [some_word]

    [java] SimpleAnalyzer:
    [java] [some] [word]

    [java] StopAnalyzer:
    [java] [some] [word]

    [java] StandardAnalyzer:
    [java] [some] [word]

    [java] SnowballAnalyzer:
    [java] [some] [word]

    [java] SnowballAnalyzer:
    [java] [some] [word]

    [java] SnowballAnalyzer:
    [java] [some] [word]

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 8, '05 at 2:44p
activeAug 8, '05 at 3:01p
posts2
users2
websitelucene.apache.org

2 users in discussion

Erik Hatcher: 1 post Dan Armbrust: 1 post

People

Translate

site design / logo © 2022 Grokbase