FAQ
I found some discussions of this question from back in 2003, but that was
many updates ago.

I have built an index using the standard stop analyser which uses the
standard list of stop words. "will" and :the" are stop words.

As I understand analyzers and phrase queries, when I search for

you will find the answer

using the default slop of 0, I should find any pattern like

you <any stop word> find <any stop word> answer

because the analyzer replaces "will" and "the" in the query with a space
indicator as it did when analyzing the original input text. Instead, I find
phrases such as

you find an answer

"an" is a stop work, so matching "find an answer" is as expected, but there
is no stop word between "you" and "find" in the original input string. I do
not see why "you find an answer" matches.

What am I doing wrong?


Also, when I try to highlight after searching for a phrase, the highlighter
highlights individual words wherever it finds them in the input text. The
documentation suggests that if I use the right scoring system, I will
highlight only long strings of adjacent tokens which are found in the
phrase, but I am not sure how to do that.

If necessary, I will paste in samples of my code for creating the indexes
and doing the search.


Thanks.

Bill Taylor

Search Discussions

  • Erick Erickson at Apr 13, 2007 at 2:04 am
    As I understand it, there really is no "space indicator". I think of it
    as replacing the stop word with a space, which is then discarded.

    so, you're indexing 'you find answer', and both your searches are
    looking for 'you find answer', the stop words are just gone as though
    they never were. So both queries match.

    But I've been wrong before <G>...

    I can't really speak to the highlighter question, so I'll let someone
    more knowledgeable pipe up.

    Erick
    On 4/12/07, Bill Taylor wrote:

    I found some discussions of this question from back in 2003, but that was
    many updates ago.

    I have built an index using the standard stop analyser which uses the
    standard list of stop words. "will" and :the" are stop words.

    As I understand analyzers and phrase queries, when I search for

    you will find the answer

    using the default slop of 0, I should find any pattern like

    you <any stop word> find <any stop word> answer

    because the analyzer replaces "will" and "the" in the query with a space
    indicator as it did when analyzing the original input text. Instead, I
    find
    phrases such as

    you find an answer

    "an" is a stop work, so matching "find an answer" is as expected, but
    there
    is no stop word between "you" and "find" in the original input string. I
    do
    not see why "you find an answer" matches.

    What am I doing wrong?


    Also, when I try to highlight after searching for a phrase, the
    highlighter
    highlights individual words wherever it finds them in the input text. The
    documentation suggests that if I use the right scoring system, I will
    highlight only long strings of adjacent tokens which are found in the
    phrase, but I am not sure how to do that.

    If necessary, I will paste in samples of my code for creating the indexes
    and doing the search.


    Thanks.

    Bill Taylor
  • Paul Elschot at Apr 13, 2007 at 8:04 pm

    On Friday 13 April 2007 04:04, Erick Erickson wrote:
    As I understand it, there really is no "space indicator". I think of it
    as replacing the stop word with a space, which is then discarded.
    You can replace all stop words by your own special term value
    to have space indicator.

    It is also possible to index nothing at a particular position, for example
    at the position of a stop word. This gives a "gap" in the index,
    see below.
    so, you're indexing 'you find answer', and both your searches are
    looking for 'you find answer', the stop words are just gone as though
    they never were. So both queries match.

    But I've been wrong before <G>...

    I can't really speak to the highlighter question, so I'll let someone
    more knowledgeable pipe up.

    Erick
    On 4/12/07, Bill Taylor wrote:

    I found some discussions of this question from back in 2003, but that was
    many updates ago.

    I have built an index using the standard stop analyser which uses the
    standard list of stop words. "will" and :the" are stop words.

    As I understand analyzers and phrase queries, when I search for

    you will find the answer

    using the default slop of 0, I should find any pattern like

    you <any stop word> find <any stop word> answer

    because the analyzer replaces "will" and "the" in the query with a space
    indicator as it did when analyzing the original input text. Instead, I
    find
    phrases such as

    you find an answer

    "an" is a stop work, so matching "find an answer" is as expected, but
    there
    is no stop word between "you" and "find" in the original input string. I
    do
    not see why "you find an answer" matches.

    What am I doing wrong?
    The problem may be that you expect a gap in the index.
    When there is a gap in the index, it is also necessary to adapt
    the analyzer used for the phrase query to query for a gap.
    I don't know whether PhraseQuery can handle such an analyzer.

    To have a gap in the index, you need to change your analyzer
    to add a gap for a stop word. This can be done by changing the
    position increment when a stop word is encountered, see
    Token.setPositionIncrement(). Iirc you need to make a variation
    on StopFilter for this.

    Regards,
    Paul Elschot



    Also, when I try to highlight after searching for a phrase, the
    highlighter
    highlights individual words wherever it finds them in the input text. The
    documentation suggests that if I use the right scoring system, I will
    highlight only long strings of adjacent tokens which are found in the
    phrase, but I am not sure how to do that.

    If necessary, I will paste in samples of my code for creating the indexes
    and doing the search.


    Thanks.

    Bill Taylor
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 13, '07 at 1:41a
activeApr 13, '07 at 8:04p
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase