FAQ
Hello,

An application I am building requires me to perform both case sensitive as well as case insensitive searches on the fly. I mean while searching for a particular word/phrase in my application, the user has the option to have the results sensitive or in-sensitive to case, depending upon the state of a checkbox. What can I do to get around this problem?

If I use StandardAnalyzer, it uses the LowerCaseFilter...so the results are all case-insensitive. If I modify the StandardAnalyzer code to remove the LowerCaseFilter, I get all case sensitive results! What should I do?

Thanks,
Ajit

Search Discussions

  • Otis Gospodnetic at Jun 20, 2002 at 1:30 pm
    Perhaps you could use 2 indices and use MultiSearcher to search them.
    When you get the results from both indices merge them, using a unique
    field to identify duplicates.

    Otis

    --- AJIT RAJWADE wrote:
    Hello,

    An application I am building requires me to perform both case
    sensitive as well as case insensitive searches on the fly. I mean
    while searching for a particular word/phrase in my application, the
    user has the option to have the results sensitive or in-sensitive to
    case, depending upon the state of a checkbox. What can I do to get
    around this problem?

    If I use StandardAnalyzer, it uses the LowerCaseFilter...so the
    results are all case-insensitive. If I modify the StandardAnalyzer
    code to remove the LowerCaseFilter, I get all case sensitive results!
    What should I do?

    Thanks,
    Ajit

    __________________________________________________
    Do You Yahoo!?
    Yahoo! - Official partner of 2002 FIFA World Cup
    http://fifaworldcup.yahoo.com

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • Anders Nielsen at Jun 20, 2002 at 3:51 pm
    1) Modify StandardAnalyzer to remove the LowerCaseFilter

    2) Make 2 fields in your index, one with the text in the original format and
    one with all lower-case format.

    3) a) If the checkbox calls for case-insensitive search, make the search
    string lower-case and search the field containing the text with lower-case
    text.

    b) If the checkbox calls for case-sensitive search, search the field with
    the original text without modifying the search string.


    regards,
    Anders Nielsen

    -----Original Message-----
    From: AJIT RAJWADE
    Sent: 20. juni 2002 14:46
    To: lucene-user@jakarta.apache.org
    Subject: Case Sensitive and Insensitive Searches BOTH needed


    Hello,

    An application I am building requires me to perform both case sensitive as
    well as case insensitive searches on the fly. I mean while searching for a
    particular word/phrase in my application, the user has the option to have
    the results sensitive or in-sensitive to case, depending upon the state of a
    checkbox. What can I do to get around this problem?

    If I use StandardAnalyzer, it uses the LowerCaseFilter...so the results are
    all case-insensitive. If I modify the StandardAnalyzer code to remove the
    LowerCaseFilter, I get all case sensitive results! What should I do?

    Thanks,
    Ajit

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • David Smiley at Jun 21, 2002 at 9:02 am
    One option that hasn't been said is to code a custom filter that
    spits out a case-preserved token and a lower-cased token --per word.
    The common case is to return just one token since most words are all
    lower case anyway. This option is better than the other ideas I've
    heard here so far because the size of the index would be smaller than
    the dual-index and dual-field strategies suggested.

    ~ Dave Smiley


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • Anders Nielsen at Jun 21, 2002 at 9:22 am
    Wouldn't that make it hard to search for phrases?

    -----Original Message-----
    From: David Smiley
    Sent: 21. juni 2002 02:44
    To: Lucene Users List
    Subject: Re: Case Sensitive and Insensitive Searches BOTH needed


    One option that hasn't been said is to code a custom filter that
    spits out a case-preserved token and a lower-cased token --per word.
    The common case is to return just one token since most words are all
    lower case anyway. This option is better than the other ideas I've
    heard here so far because the size of the index would be smaller than
    the dual-index and dual-field strategies suggested.

    ~ Dave Smiley


    --
    To unsubscribe, e-mail:

    For additional commands, e-mail:


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • David Smiley at Jun 22, 2002 at 3:31 am

    On Friday, June 21, 2002, at 05:06 AM, Anders Nielsen wrote:

    Wouldn't that make it hard to search for phrases?

    If you use the same Analyzer for the query parser, then you should be
    able to search for phrases since the query itself will also go
    through the same process.

    ~ Dave


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • David Smiley at Jun 22, 2002 at 5:58 pm

    On Saturday, June 22, 2002, at 05:47 AM, Anders Nielsen wrote:

    Let's say the text is "The apple is green" and when we run that
    through the
    analyzer we get the tokens [The the apple apple is is green green].
    (Correct
    me if I'm wrong)
    Actually it would be "The the apple is green" since only one token is
    spit out if it's already lower-cased.
    Now if we want an case-sensitive search for "The apple", you're
    right that
    if we run it through the same analyzer we search for the tokens [The
    the
    apple apple].

    But what if we wan't a case-insensitive phrase search?
    Ahh, right. *case-insensitive phrase searches* won't work, but
    single word searches --case insensitive or not, and case-sensitive
    phrase searches should work. It would be nice if case sensitivity
    was something handled by lucene itself to get around this limitation
    of the phrase search. The other ideas (separate fields for case
    sensitive versions) is a bit heavyweight since there's so much
    redundant info.

    ~ Dave Smiley


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 20, '02 at 12:59p
activeJun 22, '02 at 5:58p
posts7
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase