FAQ
My question is for anyone who has experience with Lucene's SpellChecker,
especially around its performance characteristics/ramifications.

1. Given the fact that SpellChecker expands a query by adding all the
permutations of potentially misspelled word, how does it perform in general?

2. How are others handling the case where SpellChecker would NOT perform
well if you expand the query adding all the permutations? In other words,
what kind of techniques are people using to get around or alleviate the
performance hit if any?

Any sharing of information or pointers would be appreciated.

Search Discussions

  • Smokey at Dec 3, 2007 at 3:23 pm
    My question is for anyone who has experience with Lucene's SpellChecker,
    especially around its performance characteristics/ramifications.

    1. Given the fact that SpellChecker expands a query by adding all the
    permutations of potentially misspelled word, how does it perform in general?

    2. How are others handling the case where SpellChecker would NOT perform
    well if you expand the query adding all the permutations? In other words,
    what kind of techniques are people using to get around or alleviate the
    performance hit if any?

    Any sharing of information or pointers would be appreciated.
  • Doron Cohen at Dec 4, 2007 at 6:02 am
    I didn't have performance issues when using the spell checker.
    Can you describe what you tried and how long it took, so
    people can relate to that.

    AFAIK the spell checker in o.a.l.search.spell does not "expand
    a query by adding all the permutations of potentially misspelled
    word". It is based on building an auxiliary index whose *documents*
    are *words* of the main index, going through n-gram tokenization.
    A checked word is tokenized that way too, and used as a query on.
    the auxiliary index.

    There's more wisdom in the query tokenization,
    but a simplifying example an help to see how it works:
    - a misspelled word 'helo' is tokenized as 'he el lo',
    - the auxiliary index contains a document for the correct
    word "hello" that was tokenized as 'he el ll lo'
    - the score of the document 'hello' would be high when searching
    the auxiliary index for 'he el lo'.

    The only performance hit is when refreshing/rebuilding the
    auxiliary index after the lexicon of the actual index
    has changed a lot. But this can be done in the background when
    adequate for the application using Lucene and the spell checker.

    Doron

    smokey <smokeystu@gmail.com> wrote on 03/12/2007 17:23:21:
    My question is for anyone who has experience with Lucene's SpellChecker,
    especially around its performance characteristics/ramifications.

    1. Given the fact that SpellChecker expands a query by adding all the
    permutations of potentially misspelled word, how does it
    perform in general?

    2. How are others handling the case where SpellChecker would NOT perform
    well if you expand the query adding all the permutations? In other words,
    what kind of techniques are people using to get around or alleviate the
    performance hit if any?

    Any sharing of information or pointers would be appreciated.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 3, '07 at 2:24a
activeDec 4, '07 at 6:02a
posts3
users2
websitelucene.apache.org

2 users in discussion

Smokey: 2 posts Doron Cohen: 1 post

People

Translate

site design / logo © 2022 Grokbase