FAQ
Hello,

I have a stemmed index, but i want to search the exact form of a word.
I use French Analyzer, so for instance "progression", "progresser" are
indexed with the linguistic root "progress".
But if I want to search the word "progress" (and only this word), I have to
many hits (because of "progression", "progresser"...)
The field is indexed, tokenized and no store...

Is there a way to do this, I mean to search an exact word in a stemmed index
?
I suppose that I have to use the same analyzer for indexing and searching.


I try with a PhraseQuery, with quotes...

Ps : I use lucene 1.9.1

Thanks
Renald

Search Discussions

  • Renou oki at Jun 25, 2008 at 8:22 am
    Hello,

    I have a stemmed index, but i want to search the exact form of a word.
    I use French Analyzer, so for instance "progression", "progresser" are
    indexed with the linguistic root "progress".
    But if I want to search the word "progress" (and only this word), I have to
    many hits (because of "progression", "progresser"...)
    The field is indexed, tokenized and no store...

    Is there a way to do this, I mean to search an exact word in a stemmed index
    ?
    I suppose that I have to use the same analyzer for indexing and searching.


    I try with a PhraseQuery, with quotes...

    Ps : I use lucene 1.9.1

    Thanks
    Renald
  • Erick Erickson at Jun 25, 2008 at 12:54 pm
    The way I've solved this is to index the stemmed *and* a special
    token at the same position (see Synonym Analyzer). The From your
    example, say you're indexing progresser. You'd go ahead and index the
    stemmed version , "progress", AND you'd also index "progresser$"
    at the same offset. Now, when you want exact matches, search for
    the token with the $ at the end.

    This does make your index a bit larger, but not as much as you'd expect.

    Best
    Erick
    On Wed, Jun 25, 2008 at 4:21 AM, renou oki wrote:

    Hello,

    I have a stemmed index, but i want to search the exact form of a word.
    I use French Analyzer, so for instance "progression", "progresser" are
    indexed with the linguistic root "progress".
    But if I want to search the word "progress" (and only this word), I have to
    many hits (because of "progression", "progresser"...)
    The field is indexed, tokenized and no store...

    Is there a way to do this, I mean to search an exact word in a stemmed
    index
    ?
    I suppose that I have to use the same analyzer for indexing and searching.


    I try with a PhraseQuery, with quotes...

    Ps : I use lucene 1.9.1

    Thanks
    Renald
  • Matthew Hall at Jun 26, 2008 at 4:23 pm
    You could also add another data field to the index, with an untokenized
    version of your data, and then use a multifield query to go against both
    the stemmed and exact match parts of your search at the same time.

    This is a technique I've used quite often on my project with various
    different requirements for the second field. Mind you it makes the
    indexes bigger, but unless your dataset is large its not really a huge
    problem.

    Matt

    Erick Erickson wrote:
    The way I've solved this is to index the stemmed *and* a special
    token at the same position (see Synonym Analyzer). The From your
    example, say you're indexing progresser. You'd go ahead and index the
    stemmed version , "progress", AND you'd also index "progresser$"
    at the same offset. Now, when you want exact matches, search for
    the token with the $ at the end.

    This does make your index a bit larger, but not as much as you'd expect.

    Best
    Erick

    On Wed, Jun 25, 2008 at 4:21 AM, renou oki wrote:

    Hello,

    I have a stemmed index, but i want to search the exact form of a word.
    I use French Analyzer, so for instance "progression", "progresser" are
    indexed with the linguistic root "progress".
    But if I want to search the word "progress" (and only this word), I have to
    many hits (because of "progression", "progresser"...)
    The field is indexed, tokenized and no store...

    Is there a way to do this, I mean to search an exact word in a stemmed
    index
    ?
    I suppose that I have to use the same analyzer for indexing and searching.


    I try with a PhraseQuery, with quotes...

    Ps : I use lucene 1.9.1

    Thanks
    Renald
    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Renou oki at Jun 27, 2008 at 7:33 am
    Thanks for the reply.

    I will try to add an other data field.
    I thought about this solution but i was not very sure. I thought that was an
    easier solution to do that...


    best regards
    Renou

    2008/6/26 Matthew Hall <mhall@informatics.jax.org>:
    You could also add another data field to the index, with an untokenized
    version of your data, and then use a multifield query to go against both the
    stemmed and exact match parts of your search at the same time.

    This is a technique I've used quite often on my project with various
    different requirements for the second field. Mind you it makes the indexes
    bigger, but unless your dataset is large its not really a huge problem.

    Matt

    Erick Erickson wrote:
    The way I've solved this is to index the stemmed *and* a special
    token at the same position (see Synonym Analyzer). The From your
    example, say you're indexing progresser. You'd go ahead and index the
    stemmed version , "progress", AND you'd also index "progresser$"
    at the same offset. Now, when you want exact matches, search for
    the token with the $ at the end.

    This does make your index a bit larger, but not as much as you'd expect.

    Best
    Erick

    On Wed, Jun 25, 2008 at 4:21 AM, renou oki wrote:


    Hello,

    I have a stemmed index, but i want to search the exact form of a word.
    I use French Analyzer, so for instance "progression", "progresser" are
    indexed with the linguistic root "progress".
    But if I want to search the word "progress" (and only this word), I have
    to
    many hits (because of "progression", "progresser"...)
    The field is indexed, tokenized and no store...

    Is there a way to do this, I mean to search an exact word in a stemmed
    index
    ?
    I suppose that I have to use the same analyzer for indexing and
    searching.


    I try with a PhraseQuery, with quotes...

    Ps : I use lucene 1.9.1

    Thanks
    Renald

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Jun 27, 2008 at 1:38 pm
    Also, please note that I thought about it and realized that I mispoke
    when I sent out my original suggestion. You don't want an untokenized
    field in your case, you want an unstemmed one instead.

    This will allow you to get the functionality you are looking for.. at
    least I believe so ^^

    Anyhow, best of luck!

    Matt

    renou oki wrote:
    Thanks for the reply.

    I will try to add an other data field.
    I thought about this solution but i was not very sure. I thought that was an
    easier solution to do that...


    best regards
    Renou

    2008/6/26 Matthew Hall <mhall@informatics.jax.org>:

    You could also add another data field to the index, with an untokenized
    version of your data, and then use a multifield query to go against both the
    stemmed and exact match parts of your search at the same time.

    This is a technique I've used quite often on my project with various
    different requirements for the second field. Mind you it makes the indexes
    bigger, but unless your dataset is large its not really a huge problem.

    Matt

    Erick Erickson wrote:

    The way I've solved this is to index the stemmed *and* a special
    token at the same position (see Synonym Analyzer). The From your
    example, say you're indexing progresser. You'd go ahead and index the
    stemmed version , "progress", AND you'd also index "progresser$"
    at the same offset. Now, when you want exact matches, search for
    the token with the $ at the end.

    This does make your index a bit larger, but not as much as you'd expect.

    Best
    Erick

    On Wed, Jun 25, 2008 at 4:21 AM, renou oki wrote:



    Hello,

    I have a stemmed index, but i want to search the exact form of a word.
    I use French Analyzer, so for instance "progression", "progresser" are
    indexed with the linguistic root "progress".
    But if I want to search the word "progress" (and only this word), I have
    to
    many hits (because of "progression", "progresser"...)
    The field is indexed, tokenized and no store...

    Is there a way to do this, I mean to search an exact word in a stemmed
    index
    ?
    I suppose that I have to use the same analyzer for indexing and
    searching.


    I try with a PhraseQuery, with quotes...

    Ps : I use lucene 1.9.1

    Thanks
    Renald


    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 25, '08 at 8:16a
activeJun 27, '08 at 1:38p
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase