FAQ
Hi,

I am using the PorterStemAnalyzer class (attached) to provide stemming
for a Lucene index.

To stem the terms in the index we use the following...

//open an index writer in append mode
IndexWriter idxWriter = new IndexWriter(LUCENE_INDEX_PATH, new
PorterStemAnalyzer(), false);

//add the lucene document to the index
idxWriter.addDocument(idxDoc);

Having inspected the index using Luke, I can confirm that the terms are
being stemmed as expected. However, in order for this to work properly I
am not clear whether I should also be stemming the search terms that are
entered?

For example there is a term "relax" in the index which I guess is
stemmed from "relaxation". If the user searches on "relaxing" do I need
to stem the search term in order for it to return the result?

At the moment I am attempting to do this as follows...

analyzer = new PorterStemAnalyzer();
parser = new QueryParser("content", analyzer);
Query query = parser.parse("keywords: relaxing");
Hits hits = idxSearcher.search(query);

...but this is not returning any matches.

Thanks
Rob Walpole
Devon Portal Developer
Email [email protected]
Web http://www.devonline.gov.uk



<<PorterStemAnalyzer.java>>

Search Discussions

  • Erick Erickson at Jun 22, 2007 at 4:13 pm
    Yes, you should also stem the query terms. Otherwise, you'll have
    indexed "working" as "work", but your search for "working" will look
    for "working" and won't match. Which is not what you want, I'm sure.

    Query.toString() will tell you a lot about how queries are
    processed, BTW....

    In general, unless you're very sure what the effects are, you should
    use the same analyzer for indexing as you use for searching.

    Best
    Erick
    On 6/22/07, Robert Walpole wrote:

    Hi,

    I am using the PorterStemAnalyzer class (attached) to provide stemming
    for a Lucene index.

    To stem the terms in the index we use the following...

    //open an index writer in append mode
    IndexWriter idxWriter = new IndexWriter(LUCENE_INDEX_PATH, new
    PorterStemAnalyzer(), false);

    //add the lucene document to the index
    idxWriter.addDocument(idxDoc);

    Having inspected the index using Luke, I can confirm that the terms are
    being stemmed as expected. However, in order for this to work properly I
    am not clear whether I should also be stemming the search terms that are
    entered?

    For example there is a term "relax" in the index which I guess is
    stemmed from "relaxation". If the user searches on "relaxing" do I need
    to stem the search term in order for it to return the result?

    At the moment I am attempting to do this as follows...

    analyzer = new PorterStemAnalyzer();
    parser = new QueryParser("content", analyzer);
    Query query = parser.parse("keywords: relaxing");
    Hits hits = idxSearcher.search(query);

    ...but this is not returning any matches.

    Thanks
    Rob Walpole
    Devon Portal Developer
    Email [email protected]
    Web http://www.devonline.gov.uk



    <<PorterStemAnalyzer.java>>


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Jiye Yu at Jun 22, 2007 at 9:27 pm
    Hi,

    I guess an Analyzer (built in ones such as StandardAnalyzer,
    POrterStemAnalyer and etc) is not thread safe. But I wonder if it's ok
    to share the same analyzer object within a thread. For example, if I
    want to create a PerFieldAnalyzer for 5 fields, can I use the same
    Analyzer object for all the fields? Or can both a QueryParser and
    Indexer/Searcher use the same Analyzer object?

    Any comment?


    Thanks!

    Jay

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Yonik Seeley at Jun 22, 2007 at 9:32 pm

    On 6/22/07, Jiye Yu wrote:
    I guess an Analyzer (built in ones such as StandardAnalyzer,
    POrterStemAnalyer and etc) is not thread safe.
    Analyzers *are* thread-safe.
    Multiple threads can all call analyzer.tokenStream() without any
    synchronization.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Jiye Yu at Jun 22, 2007 at 9:42 pm
    I see.
    I guess those Filters (e.g. PorterStemFilter) that make up the analyzer
    are not thread safe or cannot be shared.
    Thanks for your quick response!

    Jay

    Yonik Seeley wrote:
    On 6/22/07, Jiye Yu wrote:
    I guess an Analyzer (built in ones such as StandardAnalyzer,
    POrterStemAnalyer and etc) is not thread safe.
    Analyzers *are* thread-safe.
    Multiple threads can all call analyzer.tokenStream() without any
    synchronization.

    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Steven Rowe at Jun 22, 2007 at 4:27 pm
    Hi Rob,

    Robert Walpole wrote:
    At the moment I am attempting to do this as follows...

    analyzer = new PorterStemAnalyzer();
    parser = new QueryParser("content", analyzer);
    Query query = parser.parse("keywords: relaxing");
    Hits hits = idxSearcher.search(query);

    ...but this is not returning any matches.
    Are you trying to look up "relaxing" in a field named "keywords"? If
    so, the intervening space defeats this - your code results in a search
    for (stemmed versions of) "keywords:" or "relaxing" in the default
    "content" field.

    --
    Steve Rowe
    Center for Natural Language Processing
    http://www.cnlp.org/tech/lucene.asp

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 22, '07 at 3:40p
activeJun 22, '07 at 9:42p
posts6
users5
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase