FAQ
Hi,

I'm trying to figure out a way to locate tokens which include special characters. The actual text in the file being indexed is something like "function() { statement1; statement2; }"

The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()". If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments.

How can I enforce () to be included?

Thanks,
- Dmitry

Search Discussions

  • Michael D. Curtin at Jan 27, 2006 at 10:15 pm

    Dmitry Goldenberg wrote:

    Hi,

    I'm trying to figure out a way to locate tokens which include special characters. The actual text in the file being indexed is something like "function() { statement1; statement2; }"

    The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()". If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments.

    How can I enforce () to be included?
    I think you're going to have to write your own Analyzer subclass that
    keeps special characters in the terms. Then, use that Analyzer during
    indexing. The included Analyzers drop parentheses and the like.

    If you're using Lucene's QueryParser, then use your new Analyzer there,
    too, and escape things like parentheses in the query text you submit to
    parse().

    I think there's a discussion of custom Analyzers in the Lucene book, but
    I don't know where. Maybe somebody else on this list knows???

    Good luck!

    --MDC

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dmitry Goldenberg at Jan 30, 2006 at 3:57 pm
    Michael,

    Yes, you're describing pretty much what I was thinking of but --

    a) if I index "function()" as "function()" rather than "function", does that mean that if I search for "function", then it won't be found? -- the problem is that in some cases, the user will want to find function(), and in some cases just function -- can I accommodate for both?

    b) I understand about QueryParser.escape at searching time; at indexing time though, do I still need to escape the indexed values, e.g. keyword values, and store them in the escaped fashion, e.g. function\() -- or is function() ok?

    Thanks,
    - Dmitry

    ________________________________

    From: Michael D. Curtin
    Sent: Fri 1/27/2006 2:14 PM
    To: java-user@lucene.apache.org
    Subject: Re: How to find "function()" - ?



    Dmitry Goldenberg wrote:
    Hi,

    I'm trying to figure out a way to locate tokens which include special characters. The actual text in the file being indexed is something like "function() { statement1; statement2; }"

    The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()". If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments.

    How can I enforce () to be included?
    I think you're going to have to write your own Analyzer subclass that
    keeps special characters in the terms. Then, use that Analyzer during
    indexing. The included Analyzers drop parentheses and the like.

    If you're using Lucene's QueryParser, then use your new Analyzer there,
    too, and escape things like parentheses in the query text you submit to
    parse().

    I think there's a discussion of custom Analyzers in the Lucene book, but
    I don't know where. Maybe somebody else on this list knows???

    Good luck!

    --MDC

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael D. Curtin at Jan 30, 2006 at 4:59 pm

    Dmitry Goldenberg wrote:

    a) if I index "function()" as "function()" rather than "function", does that mean that if I search for "function", then it won't be found? -- the problem is that in some cases, the user will want to find function(), and in some cases just function -- can I accommodate for both?
    The term "function" is different from the term "function()", so a
    literal search for one won't find the other. Your Analyzer could emit
    two tokens for the input "function()": "function" and "function()", at
    the same position (increment 0) if that's what you want.
    b) I understand about QueryParser.escape at searching time; at indexing time though, do I still need to escape the indexed values, e.g. keyword values, and store them in the escaped fashion, e.g. function\() -- or is function() ok?
    Don't escape them at index time, only at search time.

    Good luck!

    --MDC

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 27, '06 at 10:10p
activeJan 30, '06 at 4:59p
posts4
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase