FAQ
Spellchecker should not be case-sensitive and should be stopwords-aware
-----------------------------------------------------------------------

Key: SOLR-630
URL: https://issues.apache.org/jira/browse/SOLR-630
Project: Solr
Issue Type: Bug
Components: spellchecker
Reporter: Otis Gospodnetic
Fix For: 1.3


Here are 2 more bugs:

1)
Search for:
united states of America
Suggests:
united states oft America

It looks like the SC doesn't check stopwords, and "of" is a stopword. Thus, it does not exist in the index,
but "oft" does, so SC suggests "oft" and thinks "of" is misspelled. I think the SC component should check the list of
stopwords, too, no?

2)
Search for:
united states of America
Suggests:
united states oftAmericaa

The of->oft is described above. But note how SC suggested America->Americaa, but it didn't do that for "america".
This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive?

I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Grant Ingersoll (JIRA) at Aug 5, 2008 at 9:21 pm
    [ https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Grant Ingersoll updated SOLR-630:
    ---------------------------------

    Priority: Minor (was: Major)

    Not major.
    Spellchecker should not be case-sensitive and should be stopwords-aware
    -----------------------------------------------------------------------

    Key: SOLR-630
    URL: https://issues.apache.org/jira/browse/SOLR-630
    Project: Solr
    Issue Type: Bug
    Components: spellchecker
    Reporter: Otis Gospodnetic
    Priority: Minor
    Fix For: 1.3


    Here are 2 more bugs:
    1)
    Search for:
    united states of America
    Suggests:
    united states oft America
    It looks like the SC doesn't check stopwords, and "of" is a stopword. Thus, it does not exist in the index,
    but "oft" does, so SC suggests "oft" and thinks "of" is misspelled. I think the SC component should check the list of
    stopwords, too, no?
    2)
    Search for:
    united states of America
    Suggests:
    united states oftAmericaa
    The of->oft is described above. But note how SC suggested America->Americaa, but it didn't do that for "america".
    This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive?
    I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Grant Ingersoll (JIRA) at Aug 12, 2008 at 1:31 pm
    [ https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Grant Ingersoll updated SOLR-630:
    ---------------------------------

    Fix Version/s: (was: 1.3)
    1.4

    Doesn't seem to be anyone taking this up, so marking it as 1.4.
    Spellchecker should not be case-sensitive and should be stopwords-aware
    -----------------------------------------------------------------------

    Key: SOLR-630
    URL: https://issues.apache.org/jira/browse/SOLR-630
    Project: Solr
    Issue Type: Bug
    Components: spellchecker
    Reporter: Otis Gospodnetic
    Priority: Minor
    Fix For: 1.4


    Here are 2 more bugs:
    1)
    Search for:
    united states of America
    Suggests:
    united states oft America
    It looks like the SC doesn't check stopwords, and "of" is a stopword. Thus, it does not exist in the index,
    but "oft" does, so SC suggests "oft" and thinks "of" is misspelled. I think the SC component should check the list of
    stopwords, too, no?
    2)
    Search for:
    united states of America
    Suggests:
    united states oftAmericaa
    The of->oft is described above. But note how SC suggested America->Americaa, but it didn't do that for "america".
    This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive?
    I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alex Baranov (JIRA) at Aug 19, 2009 at 3:43 am
    [ https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744860#action_12744860 ]

    Alex Baranov commented on SOLR-630:
    -----------------------------------

    I would propose to close this bug.

    1) of->oft
    Whether stop words are omitted or not depends on:
    a. If "q" parameter is used, then "queryAnalyzerFieldType" parameter is used to determine the analyzer for the query. If "queryAnalyzerFieldType" is not specified, then WhitespaceTokenizer is used.
    b. If "spellcheck.q" parameter is used, then query analyzer of the spellchecker field is used.

    2) America->Americaa, america->[none]
    I couldn't reproduce that. The results are the same as for "America" as for "america". However, spellchecker is really case-sensitive. For example, if there is "AmErIcAa" in the spellchecker index then this suggestion won't appear neither for "America" nor for "america", but would appear for "AmErIcA".
    The reason, why America->Americaa, america->Americaa lies in the n-gram method which is used in lucene spellchecker: for America and america the same grams are defined, the only difference is "startN" gram. Actually there is still might be a difference in the results: the method works so that it boosts the relevance of the suggestion if the first N letters of it are the same as in the word under spellcheck.

    I'm not sure whether case-sensitiveness(is it a word?) is a bug or not. Anyway, finding suggestions as well as creating the index for spellchecker is delegated to the Lucene SpellChecker, so this is Lucene issue, not Solr.

    P.S. I believe that one can avoid case-sensitive issue by configuring properly the analyzers (e.g. for the spellchecker field).
    Spellchecker should not be case-sensitive and should be stopwords-aware
    -----------------------------------------------------------------------

    Key: SOLR-630
    URL: https://issues.apache.org/jira/browse/SOLR-630
    Project: Solr
    Issue Type: Bug
    Components: spellchecker
    Reporter: Otis Gospodnetic
    Priority: Minor
    Fix For: 1.5


    Here are 2 more bugs:
    1)
    Search for:
    united states of America
    Suggests:
    united states oft America
    It looks like the SC doesn't check stopwords, and "of" is a stopword. Thus, it does not exist in the index,
    but "oft" does, so SC suggests "oft" and thinks "of" is misspelled. I think the SC component should check the list of
    stopwords, too, no?
    2)
    Search for:
    united states of America
    Suggests:
    united states oftAmericaa
    The of->oft is described above. But note how SC suggested America->Americaa, but it didn't do that for "america".
    This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive?
    I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hoss Man (JIRA) at Aug 26, 2009 at 12:55 am
    [ https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747724#action_12747724 ]

    Hoss Man commented on SOLR-630:
    -------------------------------

    bq. P.S. I believe that one can avoid case-sensitive issue by configuring properly the analyzers (e.g. for the spellchecker field).

    yeah ... without a concrete example of what kind of config can produce these bugs, my gut assumption is that with *some* config for spellchecker this problem doesn't exist.

    at which point this bug really just becomes an issue if our current example/documentation isn't advocating the best solution.



    Spellchecker should not be case-sensitive and should be stopwords-aware
    -----------------------------------------------------------------------

    Key: SOLR-630
    URL: https://issues.apache.org/jira/browse/SOLR-630
    Project: Solr
    Issue Type: Bug
    Components: spellchecker
    Reporter: Otis Gospodnetic
    Priority: Minor
    Fix For: 1.5


    Here are 2 more bugs:
    1)
    Search for:
    united states of America
    Suggests:
    united states oft America
    It looks like the SC doesn't check stopwords, and "of" is a stopword. Thus, it does not exist in the index,
    but "oft" does, so SC suggests "oft" and thinks "of" is misspelled. I think the SC component should check the list of
    stopwords, too, no?
    2)
    Search for:
    united states of America
    Suggests:
    united states oftAmericaa
    The of->oft is described above. But note how SC suggested America->Americaa, but it didn't do that for "america".
    This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive?
    I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shalin Shekhar Mangar (JIRA) at Dec 15, 2009 at 11:21 am
    [ https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shalin Shekhar Mangar resolved SOLR-630.
    ----------------------------------------

    Resolution: Invalid

    I don't think this is a problem. As Alex noted, it is all a matter of configuring your analyzers and spell checker correctly.
    Spellchecker should not be case-sensitive and should be stopwords-aware
    -----------------------------------------------------------------------

    Key: SOLR-630
    URL: https://issues.apache.org/jira/browse/SOLR-630
    Project: Solr
    Issue Type: Bug
    Components: spellchecker
    Reporter: Otis Gospodnetic
    Priority: Minor
    Fix For: 1.5


    Here are 2 more bugs:
    1)
    Search for:
    united states of America
    Suggests:
    united states oft America
    It looks like the SC doesn't check stopwords, and "of" is a stopword. Thus, it does not exist in the index,
    but "oft" does, so SC suggests "oft" and thinks "of" is misspelled. I think the SC component should check the list of
    stopwords, too, no?
    2)
    Search for:
    united states of America
    Suggests:
    united states oftAmericaa
    The of->oft is described above. But note how SC suggested America->Americaa, but it didn't do that for "america".
    This looks like case-sensitivity problem. Shouldn't the SC be case-insensitive?
    I can't produce a patch now (no src handy), so I'm hoping Grant or somebody else can do it based on this report.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsolr-dev @
categorieslucene
postedJul 14, '08 at 8:55p
activeDec 15, '09 at 11:21a
posts6
users1
websitelucene.apache.org...

1 user in discussion

Shalin Shekhar Mangar (JIRA): 6 posts

People

Translate

site design / logo © 2019 Grokbase