FAQ
So I'm using Snowball Analyzer on a field for business titles. The
value "Charlie's Sandwich Shoppe" becomes "charli sandwich shopp". This
happens partly because the StandardAnalyzer strips off the apostrophe-s
entirely, and then the Snowballer takes off the e. The problem is when
someone comes in to search for Charlies, without the apostrophe, they
get no match because in THAT case, Snowballer produces "charl" as the
term. Thoughts on best approach for solving this? Do I expand it to
become "{charl,charli} sandwich shop"? Should I strip apostrophe's
before feeding the beast?



Thanks

--Max

Search Discussions

  • Erick Erickson at Jun 18, 2008 at 1:46 pm
    This is tricky....

    If you strip the apostrophe, you'd get interesting results from O'brien,
    depending
    upon how you stripped it (i.e. "closed up" the word to Obrien or substituted
    a space, e.g. O brien). We've generally had the fewest surprises by closing
    up apostrophes (i.e. Obrien, Charlies).

    Unfortunately, anything you do will be wrong in some case. You can either
    do something simple like the above, or, say, generate a dictionary that you
    use. That is, basically keep a record of all the exceptions to your simple
    rule
    and transform the input before feeding the analyzer.

    Personally, though, I'd close up the apostrophe and feed the analyzer. Don't
    forget to do the same for the query.

    Best
    Erick

    You know, my job would be a lot easier if English were regularized. Sign my
    petition now!
    On Tue, Jun 17, 2008 at 5:16 PM, Max Metral wrote:

    So I'm using Snowball Analyzer on a field for business titles. The
    value "Charlie's Sandwich Shoppe" becomes "charli sandwich shopp". This
    happens partly because the StandardAnalyzer strips off the apostrophe-s
    entirely, and then the Snowballer takes off the e. The problem is when
    someone comes in to search for Charlies, without the apostrophe, they
    get no match because in THAT case, Snowballer produces "charl" as the
    term. Thoughts on best approach for solving this? Do I expand it to
    become "{charl,charli} sandwich shop"? Should I strip apostrophe's
    before feeding the beast?



    Thanks

    --Max


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 17, '08 at 9:17p
activeJun 18, '08 at 1:46p
posts2
users2
websitelucene.apache.org

2 users in discussion

Max Metral: 1 post Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase