FAQ
Hi

Guy's

Apologies.........


I am little Confused with the Search Factor.


If the Search Word 'kid' is suppose to return me kid , kid's ,
kidoos, children

1) Do I need to use Combination of more then one Analysers ??? , If
so How.
2) Any Alternate modification to be done for the simple Searcher
methods. ??




Thx in advance.



-----Original Message-----
From: gwithers@connected.com
Sent: Thursday, October 28, 2004 8:55 PM
To: lucene-user@jakarta.apache.org
Subject: RE: Searchable Solutions Please


A quick pointer..

What you want to look at is using a stemming implementation. Look, for
example, at the FAQ and docs related to the PorterStemFilter and writing
A customer analyzer
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.index
ing&toc=faq#q17).

There is a lot of information regarding this but you'll need the same
analyzer for index and query and this would be more or less English only.

-George
-----Original Message-----
From: Karthik N S
Sent: Thursday, October 28, 2004 1:47 AM
To: LUCENE
Subject: Searchable Solutions Please


Hi Guys


Aplologies....................


On a Using the Lucene Search , If returned hits for the following is to be
aquired

Search Word =' kids watches '
Hits on docs returned should have = kid's , kid watch , junior watches


Solution's Please


Thx in advance






WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Erik Hatcher at Nov 3, 2004 at 9:55 am

    On Nov 2, 2004, at 11:15 PM, Karthik N S wrote:
    If the Search Word 'kid' is suppose to return me kid ,
    kid's ,
    kidoos, children

    1) Do I need to use Combination of more then one Analysers ???
    , If
    so How.
    2) Any Alternate modification to be done for the simple Searcher
    methods. ??
    <marketing>We cover this very scenario in Lucene in Action</marketing>
    :)

    You may need a combination of Analyzers, depending on what type of
    analysis makes sense for each field.

    However, let's not concern ourselves with multiple analyzers for the
    sake of this issue. Suppose you have the following documents (all in a
    single "contents" field):

    Doc 1: "My kid beat me in chess" [yes, it's true]
    Doc 2: "The kid's brilliant"
    Doc 3: "Hey kidoos, it's time to get ready for bed"
    Doc 4: "Children are our most precious resource"

    There are two approaches - expansion of synonyms at indexing time, or
    query expansion.

    Expansion at indexing time will increase the size of the index, of
    course, but with the benefit of simpler and faster queries. An
    analyzer would inject the synonyms into the same virtual position as
    the original word (using position offset of 0 for the injected words).
    Here's what the heart of the Lucene in Action SynonymAnalyzer looks
    like:

    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new SynonymFilter(
    new StopFilter(
    new LowerCaseFilter(
    new StandardFilter(
    new StandardTokenizer(reader))),
    StandardAnalyzer.STOP_WORDS),
    engine
    );
    return result;
    }

    This is the StandardAnalyzer wrapped with a custom SynonymFilter. The
    SynonymFilter is driven by a SynonymEngine:

    public interface SynonymEngine {
    String[] getSynonyms(String s) throws IOException;
    }

    The implementations of the SynonymEngine are up to your imagination. I
    created a mock one for unit testing, and a real one using the WordNet
    database.

    In my examples, I show the SynonymAnalyzer used during indexing, but
    during searching I use the StandardAnalyzer. Since the synonyms have
    already been injected during indexing, it is unnecessary to be
    concerned with them during querying. (it gets tricky using analyzers
    that stack tokens into the same position when using QueryParser and
    PhraseQuery)

    Taking Doc 1 as an example, with a hypothetical SynonymEngine, the
    tokens emitted from a SynonymAnalyzer could be:

    1: my
    2: kid kids children kidoos
    3: beat
    4: me
    5: in
    6: chess

    In the 2nd position, all the synonyms appear for "kid". Now phrase
    queries such as "my kidoos" work just fine.

    Hopefully this gives you some ideas on where to go from here.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 3, '04 at 4:17a
activeNov 3, '04 at 9:55a
posts2
users2
websitelucene.apache.org

2 users in discussion

Karthik N S: 1 post Erik Hatcher: 1 post

People

Translate

site design / logo © 2022 Grokbase