FAQ
Hi,



Any good approaches for allowing case sensitive and case insensitive
searches?



/

Regards

Marcus

Search Discussions

  • Marcus Falck at Oct 6, 2006 at 9:13 am
    Except adding an additional field and skipping the LowerCaseFilter. Since this severely increases the index size (and the index already is around 1 TB).



    -----Ursprungligt meddelande-----
    Från: Marcus Falck
    Skickat: den 6 oktober 2006 11:09
    Till: java-user@lucene.apache.org
    Ämne: Case sensitive / insensitive

    Hi,



    Any good approaches for allowing case sensitive and case insensitive
    searches?



    /

    Regards

    Marcus






    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Karl wettin at Oct 6, 2006 at 10:18 am

    On 10/6/06, Marcus Falck wrote:
    Any good approaches for allowing case sensitive and case insensitive
    searches?

    Except adding an additional field and skipping the LowerCaseFilter. Since this
    severely increases the index size (and the index already is around 1 TB).
    I would consider if I had to index all terms case sensitive, or just a
    sub set of them. E.g. all proper nouns need to be case sensitive.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven Rowe at Oct 6, 2006 at 2:50 pm

    Marcus Falck wrote:
    Any good approaches for allowing case sensitive and case insensitive
    searches?

    Except adding an additional field and skipping the LowerCaseFilter.
    Since this severely increases the index size (and the index already
    is around 1 TB).
    Hi Marcus,

    How about a filter that emits two token for non-fully-lowercase tokens:
    first the original, and then the downcased version, and places both at
    the same position. This should minimize index growth.

    Something like this (WARNING: Not Tested!!):

    -----------begin DualCaseFilter.java-------------

    package org.apache.lucene.analysis;

    import java.io.IOException;

    public final class DualCaseFilter extends TokenFilter {
    String downcasedPreviousToken = null;

    public DualCaseFilter(TokenStream input) {
    super(input);
    }

    public final Token next() throws IOException {
    if (downcasedPreviousToken != null) {
    Token t = downcasedPreviousToken;
    downcasedPreviousToken = null;
    return t;
    }
    Token t = input.next();
    if (t != null) {
    String downcased = t.termText.toLowerCase();
    if ( ! t.termText.equals(downcased)) {
    downcasedPreviousToken = t.clone();
    downcasedPreviousToken.termText = downcased;
    downcasedPreviousToken.setPositionIncrement(0);
    }
    }
    return t;
    }
    }

    -----------end DualCaseFilter.java-------------

    Hope it helps,
    Steve

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erik Hatcher at Oct 6, 2006 at 10:01 am

    On Oct 6, 2006, at 5:09 AM, Marcus Falck wrote:
    Any good approaches for allowing case sensitive and case insensitive
    searches?
    I had this requirement for one application, and implemented it with
    two different indexes. It could also be accomplished with different
    fields, but that would have been even more complex (a custom query
    parser) to toggle which fields were being used.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 6, '06 at 9:09a
activeOct 6, '06 at 2:50p
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase