FAQ
Hi Guys,

I have enabled stemming:
   <fieldType name="text_stem" class="solr.TextField">
   <analyzer>
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   <filter class="solr.SnowballPorterFilterFactory" language="English"/>
   </analyzer>
   </fieldType>

In the Admin Analysis, I type in running or runs and they both break down to run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas

Search Discussions

  • Ahmet Arslan at Jun 16, 2016 at 7:38 pm
    Hi Jamal,

    Snowball requires lowercase filter above it.
    This is documented in javadocs but it is a small but important detail.
    Please use a lowercase filter after the whitescpace tokenizer.


    Ahmet
    On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" wrote:



    Hi Guys,

    I have enabled stemming:
       <fieldType name="text_stem" class="solr.TextField">
             <analyzer>
             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
             <filter class="solr.SnowballPorterFilterFactory" language="English"/>
             </analyzer>
       </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas
  • Jamal, Sarfaraz at Jun 16, 2016 at 7:59 pm
    HI Ahmet,

    Thanks for your guidance.

    I just tried the following two configurations:

       <fieldType name="text_stem" class="solr.TextField">
       <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.SnowballPorterFilterFactory" language="English"/>
       </analyzer>
       </fieldType>

    And

       <fieldType name="text_stem" class="solr.TextField">
       <analyzer>
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPossessiveFilterFactory"/>
         <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
         <filter class="solr.SnowballPorterFilterFactory"/>
       </analyzer>
       </fieldType>

    They both produced three different sets of results

    -----Original Message-----
    From: Ahmet Arslan
    Sent: Thursday, June 16, 2016 3:37 PM
    To: solr-user@lucene.apache.org
    Subject: [E] Re: Stemming



    Hi Jamal,

    Snowball requires lowercase filter above it.
    This is documented in javadocs but it is a small but important detail.
    Please use a lowercase filter after the whitescpace tokenizer.


    Ahmet
    On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" wrote:



    Hi Guys,

    I have enabled stemming:
       <fieldType name="text_stem" class="solr.TextField">
             <analyzer>
             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
             <filter class="solr.SnowballPorterFilterFactory" language="English"/>
             </analyzer>
       </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas
  • Aurélien MAZOYER at Jun 16, 2016 at 8:20 pm
    Hi,

    Yes you should have the same resultset.

    Are you sure that you reindex all the data after changing your schema?
    Are you sure that you put your analyzer both at indexing and querying?
    Are you sure you query only one field?

    Regards,

    Aurélien

    Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
    Hi Guys,

    I have enabled stemming:
    <fieldType name="text_stem" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
    </analyzer>
    </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas
  • Jamal, Sarfaraz at Jun 16, 2016 at 8:29 pm
    Hello =)

    Just to be safe and make sure it's happening at indexing time AS WELL as QUERYING time -

    I modified it to be like so:

       <fieldType name="text_stem" class="solr.TextField">
       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPossessiveFilterFactory"/>
         <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
         <filter class="solr.SnowballPorterFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPossessiveFilterFactory"/>
         <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
         <filter class="solr.SnowballPorterFilterFactory"/>
       </analyzer>
       </fieldType>

    I am re-indexing the files
    And what do you mean about only querying one field? I am not entirely sure I understand..

    Sas

    -----Original Message-----
    From: Aurélien MAZOYER
    Sent: Thursday, June 16, 2016 4:20 PM
    To: solr-user@lucene.apache.org
    Subject: [E] Re: Stemming

    Hi,

    Yes you should have the same resultset.

    Are you sure that you reindex all the data after changing your schema?
    Are you sure that you put your analyzer both at indexing and querying?
    Are you sure you query only one field?

    Regards,

    Aurélien

    Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
    Hi Guys,

    I have enabled stemming:
    <fieldType name="text_stem" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
    </analyzer>
    </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas
  • Aurélien MAZOYER at Jun 16, 2016 at 8:35 pm
    Hi,

    I was just wondering if you are sure that you query only that field (or
    fields that use your text_stem analyzer) and not other fields (in your
    qf for example is you use edismax) that can give you uncorrect results.

    Regards,

    Aurélien

    Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit :
    Hello =)

    Just to be safe and make sure it's happening at indexing time AS WELL as QUERYING time -

    I modified it to be like so:

    <fieldType name="text_stem" class="solr.TextField">
    <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>
    </fieldType>

    I am re-indexing the files
    And what do you mean about only querying one field? I am not entirely sure I understand..

    Sas

    -----Original Message-----
    From: Aurélien MAZOYER
    Sent: Thursday, June 16, 2016 4:20 PM
    To: solr-user@lucene.apache.org
    Subject: [E] Re: Stemming

    Hi,

    Yes you should have the same resultset.

    Are you sure that you reindex all the data after changing your schema?
    Are you sure that you put your analyzer both at indexing and querying?
    Are you sure you query only one field?

    Regards,

    Aurélien

    Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
    Hi Guys,

    I have enabled stemming:
    <fieldType name="text_stem" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
    </analyzer>
    </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas
  • Jamal, Sarfaraz at Jun 16, 2016 at 8:36 pm
    Oh, is this what you meant?

       <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
         <lst name="defaults">
           <str name="df">content_stemming</str>
           <!-- <str name="df">_text_</str> -->
         </lst>
       </initParams>

    I changed it to content_stemming and now it seems to work :) - It was _text_ before -

    Thanks! I will update if I discover anything amiss

    Thanks again so much =)

    Sas

    -----Original Message-----
    From: Aurélien MAZOYER
    Sent: Thursday, June 16, 2016 4:36 PM
    To: solr-user@lucene.apache.org
    Subject: Re: [E] Re: Stemming

    Hi,

    I was just wondering if you are sure that you query only that field (or fields that use your text_stem analyzer) and not other fields (in your qf for example is you use edismax) that can give you uncorrect results.

    Regards,

    Aurélien

    Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit :
    Hello =)

    Just to be safe and make sure it's happening at indexing time AS WELL
    as QUERYING time -

    I modified it to be like so:

    <fieldType name="text_stem" class="solr.TextField">
    <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>
    </fieldType>

    I am re-indexing the files
    And what do you mean about only querying one field? I am not entirely sure I understand..

    Sas

    -----Original Message-----
    From: Aurélien MAZOYER
    Sent: Thursday, June 16, 2016 4:20 PM
    To: solr-user@lucene.apache.org
    Subject: [E] Re: Stemming

    Hi,

    Yes you should have the same resultset.

    Are you sure that you reindex all the data after changing your schema?
    Are you sure that you put your analyzer both at indexing and querying?
    Are you sure you query only one field?

    Regards,

    Aurélien

    Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
    Hi Guys,

    I have enabled stemming:
    <fieldType name="text_stem" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
    </analyzer>
    </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query
    -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas
  • Aurélien MAZOYER at Jun 16, 2016 at 8:42 pm
    No problem :-)

    Aurélien

    Le 16/06/2016 22:36, Jamal, Sarfaraz a écrit :
    Oh, is this what you meant?

    <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
    <lst name="defaults">
    <str name="df">content_stemming</str>
    <!-- <str name="df">_text_</str> -->
    </lst>
    </initParams>

    I changed it to content_stemming and now it seems to work :) - It was _text_ before -

    Thanks! I will update if I discover anything amiss

    Thanks again so much =)

    Sas

    -----Original Message-----
    From: Aurélien MAZOYER
    Sent: Thursday, June 16, 2016 4:36 PM
    To: solr-user@lucene.apache.org
    Subject: Re: [E] Re: Stemming

    Hi,

    I was just wondering if you are sure that you query only that field (or fields that use your text_stem analyzer) and not other fields (in your qf for example is you use edismax) that can give you uncorrect results.

    Regards,

    Aurélien

    Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit :
    Hello =)

    Just to be safe and make sure it's happening at indexing time AS WELL
    as QUERYING time -

    I modified it to be like so:

    <fieldType name="text_stem" class="solr.TextField">
    <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>
    </fieldType>

    I am re-indexing the files
    And what do you mean about only querying one field? I am not entirely sure I understand..

    Sas

    -----Original Message-----
    From: Aurélien MAZOYER
    Sent: Thursday, June 16, 2016 4:20 PM
    To: solr-user@lucene.apache.org
    Subject: [E] Re: Stemming

    Hi,

    Yes you should have the same resultset.

    Are you sure that you reindex all the data after changing your schema?
    Are you sure that you put your analyzer both at indexing and querying?
    Are you sure you query only one field?

    Regards,

    Aurélien

    Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
    Hi Guys,

    I have enabled stemming:
    <fieldType name="text_stem" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
    </analyzer>
    </fieldType>

    In the Admin Analysis, I type in running or runs and they both break down to run.
    However when I search for run, runs, or running with an actual query
    -

    It brings back three different sets of results.

    Is that correct?

    I would imagine that all three would bring back the exact same resultset?

    Sas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedJun 16, '16 at 7:13p
activeJun 16, '16 at 8:42p
posts8
users3
websitelucene.apache.org...

People

Translate

site design / logo © 2019 Grokbase