FAQ
Hi,

We are using lucene in a chemistry database, and we are dealing with special
words containing both digits and characters in English alphabets, such as
PFC-0234. To prevent lucene from cutting the word into two, we have replaced
all dashes into underscores, so PFC-0234 is stored and indexed as PFC_0234
in the lucene index. However, none of them works for searches containing
wildcard characters. For example, none of the following works: PFC_*, PFC*,
PF*, PFC_0*, PFC_02*, but PFC-0234 works. Can anyone tell me what is wrong
here? We have tried WhitespaceAnalyzer, but it's not working either.

Thanks,

Eddie

Search Discussions

  • Erick Erickson at Aug 7, 2006 at 6:51 pm
    When you say "we've tried the whitespace analyzer", did you mean for BOTH
    indexing and searching? If you ony use it for one of those, you'd see
    results like this.

    And do you use Luke? It'll let you examine your index and see what's
    *actually* in it. It's the first place I go when I don't get results I
    expect....

    See: http://www.getopt.org/luke/

    What about capitalization? Lucene is case-sensitive. Some of the analyzers
    automatically lower-case and some don't.

    If you're using the whitespace analyzer, I don't think you need to bother
    transforming the hyphen into underscore....

    Hope this helps, without more context I'm not sure what else to suggest...

    Erick
    On 8/7/06, Yiqun Eddie Cao wrote:

    Hi,

    We are using lucene in a chemistry database, and we are dealing with
    special
    words containing both digits and characters in English alphabets, such as
    PFC-0234. To prevent lucene from cutting the word into two, we have
    replaced
    all dashes into underscores, so PFC-0234 is stored and indexed as PFC_0234
    in the lucene index. However, none of them works for searches containing
    wildcard characters. For example, none of the following works: PFC_*,
    PFC*,
    PF*, PFC_0*, PFC_02*, but PFC-0234 works. Can anyone tell me what is wrong
    here? We have tried WhitespaceAnalyzer, but it's not working either.

    Thanks,

    Eddie
  • Nicolas Lalevée at Aug 7, 2006 at 6:51 pm

    Le Lundi 07 Août 2006 19:28, Yiqun "Eddie" Cao a écrit :
    Hi,

    We are using lucene in a chemistry database, and we are dealing with
    special words containing both digits and characters in English alphabets,
    such as PFC-0234. To prevent lucene from cutting the word into two, we have
    replaced all dashes into underscores, so PFC-0234 is stored and indexed as
    PFC_0234 in the lucene index. However, none of them works for searches
    containing wildcard characters. For example, none of the following works:
    PFC_*, PFC*, PF*, PFC_0*, PFC_02*, but PFC-0234 works. Can anyone tell me
    what is wrong here? We have tried WhitespaceAnalyzer, but it's not working
    either.
    For this type of field values, you should index them with the index property :
    Field.Index.UN_TOKENIZED.

    cheers,
    Nicolas

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yiqun \"Eddie\" Cao at Aug 7, 2006 at 10:41 pm
    Setting field to Field.Index.UN_TOKENIZED works perfectly. Thanks to all.

    Regards,

    Eddie
    On 8/7/06, Nicolas Lalevée wrote:

    Le Lundi 07 Août 2006 19:28, Yiqun "Eddie" Cao a écrit:
    Hi,

    We are using lucene in a chemistry database, and we are dealing with
    special words containing both digits and characters in English
    alphabets,
    such as PFC-0234. To prevent lucene from cutting the word into two, we have
    replaced all dashes into underscores, so PFC-0234 is stored and indexed as
    PFC_0234 in the lucene index. However, none of them works for searches
    containing wildcard characters. For example, none of the following works:
    PFC_*, PFC*, PF*, PFC_0*, PFC_02*, but PFC-0234 works. Can anyone tell me
    what is wrong here? We have tried WhitespaceAnalyzer, but it's not working
    either.
    For this type of field values, you should index them with the index
    property :
    Field.Index.UN_TOKENIZED.

    cheers,
    Nicolas

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 7, '06 at 5:28p
activeAug 7, '06 at 10:41p
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase