|
A peng |
at Jun 29, 2010 at 1:58 am
|
⇧ |
| |
Hi Erick,
Thanks for you reply, now I get the point why I can not get the search
result. But can you guide me how can I use Lucene to implement the following
search feature:
Basically we can call this feature "fuzzy phrase search", which means the
search phrase may contains more words or less words compared to the
sentences indexed in the document. Again I want to use the sample I posted
to demo it:
The document contains a sentence "This is a test", and the search phrase is
"This is a formal test", as there is only one word difference between the
two sentences, how can I get the "This is a test" phrase as the search
result?
Thanks.
2010/6/29 Erick Erickson <erickerickson@gmail.com>
No, I don't think so. The critical bit is that the indexed text
does NOT contain the word "formal". So searching for
any phrase that DOES contain "formal" should fail no matter
what the slop.
Phrase queries are something like "find all the words in this
search string, ignoring some number of intervening tokens not in the
search string". There's nothing in there about "find only some of the
words in the search string"....
I'm guessing that the original post had a typo in the success case,
because it's contradicted by "a peng's" second post. It's always
possible that I'm experiencing a brain short......
Best
Erick
On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra wrote:Hey Erick
Thanks mate!
So I guess my explanation in the mail chain above was correct!
On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerickson@gmail.com
wrote:
I think you're misunderstanding the intent of PhraseQueries and slop. Slop
is the number of intervening tokens that may exist between the words
you're looking for. However, all the words you're looking for MUST
exist.
So,
<<< whenever the search phrase contains a word that don't
exist in the document, the search result will be empty >>>
is exactly how this is intended to work.
HTH
Erick
On Mon, Jun 28, 2010 at 9:09 AM, a peng wrote:
Hi,
My test result is that whenever the search phrase contains a word
that
don't
exist in the document, the search result will be empty no matter how
big
the
slop factor I set, seems this is a bug of Lucene, or it is work as design?
2010/6/28 tarun sapra <t.sapra97@gmail.com>
Hi ,
I think I have been able to understand whats happening here...
Indexed Content : "This is a test".
your search phrase : "This is a formal test"
your setting the slop factor 2 , now if your slop factor is 3 it
should
work
because "is" and "a" are stop words thus the words "This" and
"test"
are
2
slop factor apart but in your search phrase "This is a formal test"
the
words "This" and "test" are 3 slop factor thats why it's nor
working
now in search phrase "This is formal test" the words "This" and
"test"
are
2
slop factor apart thats why this phrase is working.
On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengpeng@gmail.com>
wrote:
Hi,
I am using StandardAnalyzer(Version.LUCENE_30);
2010/6/27 tarun sapra <t.sapra97@gmail.com>
which analyzer are you usin'?
On Sun, Jun 27, 2010 at 7:12 AM, a peng <
zhoudengpeng@gmail.com>
wrote:
Hi,
I know the indexed content contains the following text: "This
is
a
test".
And the search phrase I used is "This is a formal test", and
then
I
set
the
slop of the PhraseQuery as 2 with setSlop(2), but I found
that
I
can
not
get
a search result. If I set the search phrase as "This is
formal
test",
then
I
can get the search result.
So what is the problem here, thanks in advance.
Attached is the Java doc for the setSlop method:
public void *setSlop*(int s)
Sets the number of other words permitted between words in
query
phrase.
If
zero, then this is an exact phrase search. For larger values
this
works
like
a WITHIN or NEAR operator.
The slop is in fact an edit-distance, where the units
correspond
to
moves
of
terms in the query phrase out of position. For example, to
switch
the
order
of two words requires two moves (the first move places the
words
atop
one
another), so to permit re-orderings of phrases, the slop must
be
at
least
two.
More exact matches are scored higher than sloppier matches,
thus
search
results are sorted by exactness.
The slop is zero by default, requiring exact matches.
--
Thanks & Regards
Tarun Sapra
--
Thanks & Regards
Tarun Sapra
--
Thanks & Regards
Tarun Sapra