On Friday 13 April 2007 04:04, Erick Erickson wrote:
As I understand it, there really is no "space indicator". I think of it
as replacing the stop word with a space, which is then discarded.
You can replace all stop words by your own special term value
to have space indicator.
It is also possible to index nothing at a particular position, for example
at the position of a stop word. This gives a "gap" in the index,
so, you're indexing 'you find answer', and both your searches are
looking for 'you find answer', the stop words are just gone as though
they never were. So both queries match.
But I've been wrong before <G>...
I can't really speak to the highlighter question, so I'll let someone
more knowledgeable pipe up.
On 4/12/07, Bill Taylor wrote:
I found some discussions of this question from back in 2003, but that was
many updates ago.
I have built an index using the standard stop analyser which uses the
standard list of stop words. "will" and :the" are stop words.
As I understand analyzers and phrase queries, when I search for
you will find the answer
using the default slop of 0, I should find any pattern like
you <any stop word> find <any stop word> answer
because the analyzer replaces "will" and "the" in the query with a space
indicator as it did when analyzing the original input text. Instead, I
phrases such as
you find an answer
"an" is a stop work, so matching "find an answer" is as expected, but
is no stop word between "you" and "find" in the original input string. I
not see why "you find an answer" matches.
What am I doing wrong?
The problem may be that you expect a gap in the index.
When there is a gap in the index, it is also necessary to adapt
the analyzer used for the phrase query to query for a gap.
I don't know whether PhraseQuery can handle such an analyzer.
To have a gap in the index, you need to change your analyzer
to add a gap for a stop word. This can be done by changing the
position increment when a stop word is encountered, see
Token.setPositionIncrement(). Iirc you need to make a variation
on StopFilter for this.
Also, when I try to highlight after searching for a phrase, the
highlights individual words wherever it finds them in the input text. The
documentation suggests that if I use the right scoring system, I will
highlight only long strings of adjacent tokens which are found in the
phrase, but I am not sure how to do that.
If necessary, I will paste in samples of my code for creating the indexes
and doing the search.
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com