FAQ
right now when I search for 'brake a leg', solr returns valid results with
no indication of misspelling, which is understandable since all of those
terms are valid words and are probably found in a few pieces of our content.
My question is:

is there any way for it to recognize that the phase should be "break a leg"
and not "brake a leg" and suggest the proper phrase?

Search Discussions

  • Dyer, James at Feb 23, 2011 at 7:36 pm
    Tanner,

    Currently Solr will only make suggestions for words that are not in the dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". However, if you do that, then it will try to "improve" every word in your query, even the ones that are spelled correctly (so while it might change "brake" to "break" it might also change "leg" to "log".)

    You might be able to alleviate some of the pain by setting the "thresholdTokenFrequency" so as to remove misspelled and rarely-used words from your dictionary, although I personally haven't been able to get this parameter to work. It also doesn't seem to be documented on the wiki but it is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also mentioned in Smiley&Pugh's book. I tried setting it like this, but got a ClassCastException on the float value:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text_spelling</str>
    <lst name="spellchecker">
    <str name="name">spellchecker</str>
    <str name="field">Spelling_Dictionary</str>
    <str name="fieldType">text_spelling</str>
    <str name="buildOnOptimize">true</str>
    <str name="thresholdTokenFrequency">.0000001</str>
    </lst>
    </searchComponent>

    I have it on my to-do list to look into this further but haven't yet. If you decide to try it and can get it to work, please let me know how you do it.

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311

    -----Original Message-----
    From: Tanner Postert
    Sent: Wednesday, February 23, 2011 12:53 PM
    To: [email protected]
    Subject: Spellcheck Phrases

    right now when I search for 'brake a leg', solr returns valid results with
    no indication of misspelling, which is understandable since all of those
    terms are valid words and are probably found in a few pieces of our content.
    My question is:

    is there any way for it to recognize that the phase should be "break a leg"
    and not "brake a leg" and suggest the proper phrase?
  • Tanner Postert at May 27, 2011 at 11:05 pm
    are there any updates on this? any third party apps that can make this work
    as expected?
    On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote:

    Tanner,

    Currently Solr will only make suggestions for words that are not in the
    dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". However,
    if you do that, then it will try to "improve" every word in your query, even
    the ones that are spelled correctly (so while it might change "brake" to
    "break" it might also change "leg" to "log".)

    You might be able to alleviate some of the pain by setting the
    "thresholdTokenFrequency" so as to remove misspelled and rarely-used words
    from your dictionary, although I personally haven't been able to get this
    parameter to work. It also doesn't seem to be documented on the wiki but it
    is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also
    mentioned in Smiley&Pugh's book. I tried setting it like this, but got a
    ClassCastException on the float value:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text_spelling</str>
    <lst name="spellchecker">
    <str name="name">spellchecker</str>
    <str name="field">Spelling_Dictionary</str>
    <str name="fieldType">text_spelling</str>
    <str name="buildOnOptimize">true</str>
    <str name="thresholdTokenFrequency">.0000001</str>
    </lst>
    </searchComponent>

    I have it on my to-do list to look into this further but haven't yet. If
    you decide to try it and can get it to work, please let me know how you do
    it.

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311

    -----Original Message-----
    From: Tanner Postert
    Sent: Wednesday, February 23, 2011 12:53 PM
    To: [email protected]
    Subject: Spellcheck Phrases

    right now when I search for 'brake a leg', solr returns valid results with
    no indication of misspelling, which is understandable since all of those
    terms are valid words and are probably found in a few pieces of our
    content.
    My question is:

    is there any way for it to recognize that the phase should be "break a leg"
    and not "brake a leg" and suggest the proper phrase?
  • Dyer, James at Jun 1, 2011 at 8:02 pm
    Tanner,

    I just entered SOLR-2571 to fix the float-parsing-bug that breaks "thresholdTokenFrequency". Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.

    This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ...

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text</str>
    <lst name="spellchecker">
    <str name="name">spellchecker</str>
    <str name="field">Spelling_Dictionary</str>
    <str name="fieldType">text</str>
    <str name="spellcheckIndexDir">./spellchecker</str>
    <str name="thresholdTokenFrequency">.01</str>
    </lst>
    </searchComponent>

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311


    -----Original Message-----
    From: Tanner Postert
    Sent: Friday, May 27, 2011 6:04 PM
    To: [email protected]
    Subject: Re: Spellcheck Phrases

    are there any updates on this? any third party apps that can make this work
    as expected?
    On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote:

    Tanner,

    Currently Solr will only make suggestions for words that are not in the
    dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". However,
    if you do that, then it will try to "improve" every word in your query, even
    the ones that are spelled correctly (so while it might change "brake" to
    "break" it might also change "leg" to "log".)

    You might be able to alleviate some of the pain by setting the
    "thresholdTokenFrequency" so as to remove misspelled and rarely-used words
    from your dictionary, although I personally haven't been able to get this
    parameter to work. It also doesn't seem to be documented on the wiki but it
    is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also
    mentioned in Smiley&Pugh's book. I tried setting it like this, but got a
    ClassCastException on the float value:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text_spelling</str>
    <lst name="spellchecker">
    <str name="name">spellchecker</str>
    <str name="field">Spelling_Dictionary</str>
    <str name="fieldType">text_spelling</str>
    <str name="buildOnOptimize">true</str>
    <str name="thresholdTokenFrequency">.0000001</str>
    </lst>
    </searchComponent>

    I have it on my to-do list to look into this further but haven't yet. If
    you decide to try it and can get it to work, please let me know how you do
    it.

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311

    -----Original Message-----
    From: Tanner Postert
    Sent: Wednesday, February 23, 2011 12:53 PM
    To: [email protected]
    Subject: Spellcheck Phrases

    right now when I search for 'brake a leg', solr returns valid results with
    no indication of misspelling, which is understandable since all of those
    terms are valid words and are probably found in a few pieces of our
    content.
    My question is:

    is there any way for it to recognize that the phase should be "break a leg"
    and not "brake a leg" and suggest the proper phrase?
  • Dyer, James at Jun 2, 2011 at 3:40 pm
    Actually, someone just pointed out to me that a patch like this is unnecessary. The code works as-is if configured like this:

    <float name="thresholdTokenFrequency">.01</float> (correct)

    instead of this:

    <str name="thresholdTokenFrequency">.01</str> (incorrect)

    I tested this and it seems to work. I'm still am trying to figure out if using this parameter actually improves the quality of our spell suggestions, now that I know how to use it properly.

    Sorry about the mis-information earlier.

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311


    -----Original Message-----
    From: Dyer, James
    Sent: Wednesday, June 01, 2011 3:02 PM
    To: [email protected]
    Subject: RE: Spellcheck Phrases

    Tanner,

    I just entered SOLR-2571 to fix the float-parsing-bug that breaks "thresholdTokenFrequency". Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.

    This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ...

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text</str>
    <lst name="spellchecker">
    <str name="name">spellchecker</str>
    <str name="field">Spelling_Dictionary</str>
    <str name="fieldType">text</str>
    <str name="spellcheckIndexDir">./spellchecker</str>
    <str name="thresholdTokenFrequency">.01</str>
    </lst>
    </searchComponent>

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311


    -----Original Message-----
    From: Tanner Postert
    Sent: Friday, May 27, 2011 6:04 PM
    To: [email protected]
    Subject: Re: Spellcheck Phrases

    are there any updates on this? any third party apps that can make this work
    as expected?
    On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote:

    Tanner,

    Currently Solr will only make suggestions for words that are not in the
    dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". However,
    if you do that, then it will try to "improve" every word in your query, even
    the ones that are spelled correctly (so while it might change "brake" to
    "break" it might also change "leg" to "log".)

    You might be able to alleviate some of the pain by setting the
    "thresholdTokenFrequency" so as to remove misspelled and rarely-used words
    from your dictionary, although I personally haven't been able to get this
    parameter to work. It also doesn't seem to be documented on the wiki but it
    is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also
    mentioned in Smiley&Pugh's book. I tried setting it like this, but got a
    ClassCastException on the float value:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text_spelling</str>
    <lst name="spellchecker">
    <str name="name">spellchecker</str>
    <str name="field">Spelling_Dictionary</str>
    <str name="fieldType">text_spelling</str>
    <str name="buildOnOptimize">true</str>
    <str name="thresholdTokenFrequency">.0000001</str>
    </lst>
    </searchComponent>

    I have it on my to-do list to look into this further but haven't yet. If
    you decide to try it and can get it to work, please let me know how you do
    it.

    James Dyer
    E-Commerce Systems
    Ingram Content Group
    (615) 213-4311

    -----Original Message-----
    From: Tanner Postert
    Sent: Wednesday, February 23, 2011 12:53 PM
    To: [email protected]
    Subject: Spellcheck Phrases

    right now when I search for 'brake a leg', solr returns valid results with
    no indication of misspelling, which is understandable since all of those
    terms are valid words and are probably found in a few pieces of our
    content.
    My question is:

    is there any way for it to recognize that the phase should be "break a leg"
    and not "brake a leg" and suggest the proper phrase?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-user @
categorieslucene
postedFeb 23, '11 at 6:53p
activeJun 2, '11 at 3:40p
posts5
users2
websitelucene.apache.org...

2 users in discussion

Dyer, James: 3 posts Tanner Postert: 2 posts

People

Translate

site design / logo © 2023 Grokbase