FAQ
Hi,

I know the indexed content contains the following text: "This is a test".
And the search phrase I used is "This is a formal test", and then I set the
slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not get
a search result. If I set the search phrase as "This is formal test", then I
can get the search result.

So what is the problem here, thanks in advance.


Attached is the Java doc for the setSlop method:

public void *setSlop*(int s)

Sets the number of other words permitted between words in query phrase. If
zero, then this is an exact phrase search. For larger values this works like
a WITHIN or NEAR operator.

The slop is in fact an edit-distance, where the units correspond to moves of
terms in the query phrase out of position. For example, to switch the order
of two words requires two moves (the first move places the words atop one
another), so to permit re-orderings of phrases, the slop must be at least
two.

More exact matches are scored higher than sloppier matches, thus search
results are sorted by exactness.

The slop is zero by default, requiring exact matches.

Search Discussions

  • Tarun sapra at Jun 27, 2010 at 3:51 pm
    which analyzer are you usin'?

    On Sun, Jun 27, 2010 at 7:12 AM, a peng wrote:

    Hi,

    I know the indexed content contains the following text: "This is a test".
    And the search phrase I used is "This is a formal test", and then I set the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not
    get
    a search result. If I set the search phrase as "This is formal test", then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query phrase. If
    zero, then this is an exact phrase search. For larger values this works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units correspond to moves
    of
    terms in the query phrase out of position. For example, to switch the order
    of two words requires two moves (the first move places the words atop one
    another), so to permit re-orderings of phrases, the slop must be at least
    two.

    More exact matches are scored higher than sloppier matches, thus search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra
  • A peng at Jun 28, 2010 at 6:08 am
    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?

    On Sun, Jun 27, 2010 at 7:12 AM, a peng wrote:

    Hi,

    I know the indexed content contains the following text: "This is a test".
    And the search phrase I used is "This is a formal test", and then I set the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not
    get
    a search result. If I set the search phrase as "This is formal test", then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query phrase. If
    zero, then this is an exact phrase search. For larger values this works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units correspond to moves
    of
    terms in the query phrase out of position. For example, to switch the order
    of two words requires two moves (the first move places the words atop one
    another), so to permit re-orderings of phrases, the slop must be at least
    two.

    More exact matches are scored higher than sloppier matches, thus search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra
  • Tarun sapra at Jun 28, 2010 at 6:43 am
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it should work
    because "is" and "a" are stop words thus the words "This" and "test" are 2
    slop factor apart but in your search phrase "This is a formal test" the
    words "This" and "test" are 3 slop factor thats why it's nor working
    now in search phrase "This is formal test" the words "This" and "test" are 2
    slop factor apart thats why this phrase is working.


    On Mon, Jun 28, 2010 at 11:37 AM, a peng wrote:

    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?

    On Sun, Jun 27, 2010 at 7:12 AM, a peng wrote:

    Hi,

    I know the indexed content contains the following text: "This is a
    test".
    And the search phrase I used is "This is a formal test", and then I set the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that I can
    not
    get
    a search result. If I set the search phrase as "This is formal test", then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query phrase. If
    zero, then this is an exact phrase search. For larger values this works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units correspond to
    moves
    of
    terms in the query phrase out of position. For example, to switch the order
    of two words requires two moves (the first move places the words atop
    one
    another), so to permit re-orderings of phrases, the slop must be at
    least
    two.

    More exact matches are scored higher than sloppier matches, thus search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • A peng at Jun 28, 2010 at 1:11 pm
    Hi,

    My test result is that whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty no matter how big the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?

    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it should
    work
    because "is" and "a" are stop words thus the words "This" and "test" are 2
    slop factor apart but in your search phrase "This is a formal test" the
    words "This" and "test" are 3 slop factor thats why it's nor working
    now in search phrase "This is formal test" the words "This" and "test" are
    2
    slop factor apart thats why this phrase is working.


    On Mon, Jun 28, 2010 at 11:37 AM, a peng wrote:

    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?

    On Sun, Jun 27, 2010 at 7:12 AM, a peng wrote:

    Hi,

    I know the indexed content contains the following text: "This is a
    test".
    And the search phrase I used is "This is a formal test", and then I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that I can
    not
    get
    a search result. If I set the search phrase as "This is formal test", then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query
    phrase.
    If
    zero, then this is an exact phrase search. For larger values this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units correspond to
    moves
    of
    terms in the query phrase out of position. For example, to switch the order
    of two words requires two moves (the first move places the words atop
    one
    another), so to permit re-orderings of phrases, the slop must be at
    least
    two.

    More exact matches are scored higher than sloppier matches, thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • Erick Erickson at Jun 28, 2010 at 1:21 pm
    I think you're misunderstanding the intent of PhraseQueries and slop. Slop
    is the number of intervening tokens that may exist between the words
    you're looking for. However, all the words you're looking for MUST exist.
    So,

    <<< whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty >>>

    is exactly how this is intended to work.

    HTH
    Erick

    On Mon, Jun 28, 2010 at 9:09 AM, a peng wrote:

    Hi,

    My test result is that whenever the search phrase contains a word that
    don't
    exist in the document, the search result will be empty no matter how big
    the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?

    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it should
    work
    because "is" and "a" are stop words thus the words "This" and "test" are 2
    slop factor apart but in your search phrase "This is a formal test" the
    words "This" and "test" are 3 slop factor thats why it's nor working
    now in search phrase "This is formal test" the words "This" and "test" are
    2
    slop factor apart thats why this phrase is working.


    On Mon, Jun 28, 2010 at 11:37 AM, a peng wrote:

    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?


    On Sun, Jun 27, 2010 at 7:12 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I know the indexed content contains the following text: "This is a
    test".
    And the search phrase I used is "This is a formal test", and then I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that I
    can
    not
    get
    a search result. If I set the search phrase as "This is formal
    test",
    then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query
    phrase.
    If
    zero, then this is an exact phrase search. For larger values this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units correspond to
    moves
    of
    terms in the query phrase out of position. For example, to switch
    the
    order
    of two words requires two moves (the first move places the words
    atop
    one
    another), so to permit re-orderings of phrases, the slop must be at
    least
    two.

    More exact matches are scored higher than sloppier matches, thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • Tarun sapra at Jun 28, 2010 at 3:33 pm
    Hey Erick

    Thanks mate!

    So I guess my explanation in the mail chain above was correct!
    On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson wrote:

    I think you're misunderstanding the intent of PhraseQueries and slop. Slop
    is the number of intervening tokens that may exist between the words
    you're looking for. However, all the words you're looking for MUST exist.
    So,

    <<< whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty >>>

    is exactly how this is intended to work.

    HTH
    Erick

    On Mon, Jun 28, 2010 at 9:09 AM, a peng wrote:

    Hi,

    My test result is that whenever the search phrase contains a word that
    don't
    exist in the document, the search result will be empty no matter how big
    the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?
    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it should
    work
    because "is" and "a" are stop words thus the words "This" and "test"
    are
    2
    slop factor apart but in your search phrase "This is a formal test" the
    words "This" and "test" are 3 slop factor thats why it's nor working
    now in search phrase "This is formal test" the words "This" and "test" are
    2
    slop factor apart thats why this phrase is working.


    On Mon, Jun 28, 2010 at 11:37 AM, a peng wrote:

    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?


    On Sun, Jun 27, 2010 at 7:12 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I know the indexed content contains the following text: "This is
    a
    test".
    And the search phrase I used is "This is a formal test", and then
    I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that I
    can
    not
    get
    a search result. If I set the search phrase as "This is formal
    test",
    then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query
    phrase.
    If
    zero, then this is an exact phrase search. For larger values this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units correspond
    to
    moves
    of
    terms in the query phrase out of position. For example, to switch
    the
    order
    of two words requires two moves (the first move places the words
    atop
    one
    another), so to permit re-orderings of phrases, the slop must be
    at
    least
    two.

    More exact matches are scored higher than sloppier matches, thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • Erick Erickson at Jun 28, 2010 at 6:56 pm
    No, I don't think so. The critical bit is that the indexed text
    does NOT contain the word "formal". So searching for
    any phrase that DOES contain "formal" should fail no matter
    what the slop.

    Phrase queries are something like "find all the words in this
    search string, ignoring some number of intervening tokens not in the
    search string". There's nothing in there about "find only some of the
    words in the search string"....

    I'm guessing that the original post had a typo in the success case,
    because it's contradicted by "a peng's" second post. It's always
    possible that I'm experiencing a brain short......

    Best
    Erick
    On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra wrote:

    Hey Erick

    Thanks mate!

    So I guess my explanation in the mail chain above was correct!

    On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerickson@gmail.com
    wrote:
    I think you're misunderstanding the intent of PhraseQueries and slop. Slop
    is the number of intervening tokens that may exist between the words
    you're looking for. However, all the words you're looking for MUST exist.
    So,

    <<< whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty >>>

    is exactly how this is intended to work.

    HTH
    Erick

    On Mon, Jun 28, 2010 at 9:09 AM, a peng wrote:

    Hi,

    My test result is that whenever the search phrase contains a word that
    don't
    exist in the document, the search result will be empty no matter how
    big
    the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?
    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it
    should
    work
    because "is" and "a" are stop words thus the words "This" and "test"
    are
    2
    slop factor apart but in your search phrase "This is a formal test"
    the
    words "This" and "test" are 3 slop factor thats why it's nor working
    now in search phrase "This is formal test" the words "This" and
    "test"
    are
    2
    slop factor apart thats why this phrase is working.



    On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?


    On Sun, Jun 27, 2010 at 7:12 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I know the indexed content contains the following text: "This
    is
    a
    test".
    And the search phrase I used is "This is a formal test", and
    then
    I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found that
    I
    can
    not
    get
    a search result. If I set the search phrase as "This is formal
    test",
    then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in query
    phrase.
    If
    zero, then this is an exact phrase search. For larger values
    this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units
    correspond
    to
    moves
    of
    terms in the query phrase out of position. For example, to
    switch
    the
    order
    of two words requires two moves (the first move places the
    words
    atop
    one
    another), so to permit re-orderings of phrases, the slop must
    be
    at
    least
    two.

    More exact matches are scored higher than sloppier matches,
    thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • A peng at Jun 29, 2010 at 1:58 am
    Hi Erick,

    Thanks for you reply, now I get the point why I can not get the search
    result. But can you guide me how can I use Lucene to implement the following
    search feature:
    Basically we can call this feature "fuzzy phrase search", which means the
    search phrase may contains more words or less words compared to the
    sentences indexed in the document. Again I want to use the sample I posted
    to demo it:

    The document contains a sentence "This is a test", and the search phrase is
    "This is a formal test", as there is only one word difference between the
    two sentences, how can I get the "This is a test" phrase as the search
    result?

    Thanks.



    2010/6/29 Erick Erickson <erickerickson@gmail.com>
    No, I don't think so. The critical bit is that the indexed text
    does NOT contain the word "formal". So searching for
    any phrase that DOES contain "formal" should fail no matter
    what the slop.

    Phrase queries are something like "find all the words in this
    search string, ignoring some number of intervening tokens not in the
    search string". There's nothing in there about "find only some of the
    words in the search string"....

    I'm guessing that the original post had a typo in the success case,
    because it's contradicted by "a peng's" second post. It's always
    possible that I'm experiencing a brain short......

    Best
    Erick
    On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra wrote:

    Hey Erick

    Thanks mate!

    So I guess my explanation in the mail chain above was correct!

    On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerickson@gmail.com
    wrote:
    I think you're misunderstanding the intent of PhraseQueries and slop. Slop
    is the number of intervening tokens that may exist between the words
    you're looking for. However, all the words you're looking for MUST
    exist.
    So,

    <<< whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty >>>

    is exactly how this is intended to work.

    HTH
    Erick

    On Mon, Jun 28, 2010 at 9:09 AM, a peng wrote:

    Hi,

    My test result is that whenever the search phrase contains a word
    that
    don't
    exist in the document, the search result will be empty no matter how
    big
    the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?
    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it
    should
    work
    because "is" and "a" are stop words thus the words "This" and
    "test"
    are
    2
    slop factor apart but in your search phrase "This is a formal test"
    the
    words "This" and "test" are 3 slop factor thats why it's nor
    working
    now in search phrase "This is formal test" the words "This" and
    "test"
    are
    2
    slop factor apart thats why this phrase is working.



    On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?


    On Sun, Jun 27, 2010 at 7:12 AM, a peng <
    zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I know the indexed content contains the following text: "This
    is
    a
    test".
    And the search phrase I used is "This is a formal test", and
    then
    I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found
    that
    I
    can
    not
    get
    a search result. If I set the search phrase as "This is
    formal
    test",
    then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in
    query
    phrase.
    If
    zero, then this is an exact phrase search. For larger values
    this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units
    correspond
    to
    moves
    of
    terms in the query phrase out of position. For example, to
    switch
    the
    order
    of two words requires two moves (the first move places the
    words
    atop
    one
    another), so to permit re-orderings of phrases, the slop must
    be
    at
    least
    two.

    More exact matches are scored higher than sloppier matches,
    thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • A peng at Jun 30, 2010 at 12:31 am
    Hi Erick,

    Any comments about this requirement?

    2010/6/29 a peng <zhoudengpeng@gmail.com>
    Hi Erick,

    Thanks for you reply, now I get the point why I can not get the search
    result. But can you guide me how can I use Lucene to implement the following
    search feature:
    Basically we can call this feature "fuzzy phrase search", which means the
    search phrase may contains more words or less words compared to the
    sentences indexed in the document. Again I want to use the sample I posted
    to demo it:

    The document contains a sentence "This is a test", and the search phrase is
    "This is a formal test", as there is only one word difference between the
    two sentences, how can I get the "This is a test" phrase as the search
    result?

    Thanks.



    2010/6/29 Erick Erickson <erickerickson@gmail.com>

    No, I don't think so. The critical bit is that the indexed text
    does NOT contain the word "formal". So searching for
    any phrase that DOES contain "formal" should fail no matter
    what the slop.

    Phrase queries are something like "find all the words in this
    search string, ignoring some number of intervening tokens not in the
    search string". There's nothing in there about "find only some of the
    words in the search string"....

    I'm guessing that the original post had a typo in the success case,
    because it's contradicted by "a peng's" second post. It's always
    possible that I'm experiencing a brain short......

    Best
    Erick

    On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapra97@gmail.com>
    wrote:
    Hey Erick

    Thanks mate!

    So I guess my explanation in the mail chain above was correct!

    On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <
    erickerickson@gmail.com
    wrote:
    I think you're misunderstanding the intent of PhraseQueries and slop. Slop
    is the number of intervening tokens that may exist between the words
    you're looking for. However, all the words you're looking for MUST
    exist.
    So,

    <<< whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty >>>

    is exactly how this is intended to work.

    HTH
    Erick


    On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    My test result is that whenever the search phrase contains a word
    that
    don't
    exist in the document, the search result will be empty no matter how
    big
    the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?
    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it
    should
    work
    because "is" and "a" are stop words thus the words "This" and
    "test"
    are
    2
    slop factor apart but in your search phrase "This is a formal
    test"
    the
    words "This" and "test" are 3 slop factor thats why it's nor
    working
    now in search phrase "This is formal test" the words "This" and
    "test"
    are
    2
    slop factor apart thats why this phrase is working.



    On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?


    On Sun, Jun 27, 2010 at 7:12 AM, a peng <
    zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I know the indexed content contains the following text:
    "This
    is
    a
    test".
    And the search phrase I used is "This is a formal test", and
    then
    I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found
    that
    I
    can
    not
    get
    a search result. If I set the search phrase as "This is
    formal
    test",
    then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in
    query
    phrase.
    If
    zero, then this is an exact phrase search. For larger values
    this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units
    correspond
    to
    moves
    of
    terms in the query phrase out of position. For example, to
    switch
    the
    order
    of two words requires two moves (the first move places the
    words
    atop
    one
    another), so to permit re-orderings of phrases, the slop
    must
    be
    at
    least
    two.

    More exact matches are scored higher than sloppier matches,
    thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra
  • Erick Erickson at Jun 30, 2010 at 1:09 am
    No, I really don't have a good solution off the top of my head, perhaps
    others can chime in...

    Although I suppose you could fire multiple queries dropping out
    some number of search terms, but I don't know whether that
    satisfies your requirements. E.g. search the following phrases
    "this is a formal test"
    "this is a formal"
    "this is a test"
    "this is formal test"
    "this a formal test"
    "is a formal test"

    and so on...

    Best
    Erick
    On Tue, Jun 29, 2010 at 8:30 PM, a peng wrote:

    Hi Erick,

    Any comments about this requirement?

    2010/6/29 a peng <zhoudengpeng@gmail.com>
    Hi Erick,

    Thanks for you reply, now I get the point why I can not get the search
    result. But can you guide me how can I use Lucene to implement the following
    search feature:
    Basically we can call this feature "fuzzy phrase search", which means the
    search phrase may contains more words or less words compared to the
    sentences indexed in the document. Again I want to use the sample I posted
    to demo it:

    The document contains a sentence "This is a test", and the search phrase is
    "This is a formal test", as there is only one word difference between the
    two sentences, how can I get the "This is a test" phrase as the search
    result?

    Thanks.



    2010/6/29 Erick Erickson <erickerickson@gmail.com>

    No, I don't think so. The critical bit is that the indexed text
    does NOT contain the word "formal". So searching for
    any phrase that DOES contain "formal" should fail no matter
    what the slop.

    Phrase queries are something like "find all the words in this
    search string, ignoring some number of intervening tokens not in the
    search string". There's nothing in there about "find only some of the
    words in the search string"....

    I'm guessing that the original post had a typo in the success case,
    because it's contradicted by "a peng's" second post. It's always
    possible that I'm experiencing a brain short......

    Best
    Erick

    On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapra97@gmail.com>
    wrote:
    Hey Erick

    Thanks mate!

    So I guess my explanation in the mail chain above was correct!

    On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <
    erickerickson@gmail.com
    wrote:
    I think you're misunderstanding the intent of PhraseQueries and
    slop.
    Slop
    is the number of intervening tokens that may exist between the words
    you're looking for. However, all the words you're looking for MUST
    exist.
    So,

    <<< whenever the search phrase contains a word that don't
    exist in the document, the search result will be empty >>>

    is exactly how this is intended to work.

    HTH
    Erick


    On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengpeng@gmail.com>
    wrote:
    Hi,

    My test result is that whenever the search phrase contains a word
    that
    don't
    exist in the document, the search result will be empty no matter
    how
    big
    the
    slop factor I set, seems this is a bug of Lucene, or it is work as design?
    2010/6/28 tarun sapra <t.sapra97@gmail.com>
    Hi ,

    I think I have been able to understand whats happening here...

    Indexed Content : "This is a test".
    your search phrase : "This is a formal test"
    your setting the slop factor 2 , now if your slop factor is 3 it
    should
    work
    because "is" and "a" are stop words thus the words "This" and
    "test"
    are
    2
    slop factor apart but in your search phrase "This is a formal
    test"
    the
    words "This" and "test" are 3 slop factor thats why it's nor
    working
    now in search phrase "This is formal test" the words "This" and
    "test"
    are
    2
    slop factor apart thats why this phrase is working.



    On Mon, Jun 28, 2010 at 11:37 AM, a peng <
    zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I am using StandardAnalyzer(Version.LUCENE_30);

    2010/6/27 tarun sapra <t.sapra97@gmail.com>
    which analyzer are you usin'?


    On Sun, Jun 27, 2010 at 7:12 AM, a peng <
    zhoudengpeng@gmail.com>
    wrote:
    Hi,

    I know the indexed content contains the following text:
    "This
    is
    a
    test".
    And the search phrase I used is "This is a formal test",
    and
    then
    I
    set
    the
    slop of the PhraseQuery as 2 with setSlop(2), but I found
    that
    I
    can
    not
    get
    a search result. If I set the search phrase as "This is
    formal
    test",
    then
    I
    can get the search result.

    So what is the problem here, thanks in advance.


    Attached is the Java doc for the setSlop method:

    public void *setSlop*(int s)

    Sets the number of other words permitted between words in
    query
    phrase.
    If
    zero, then this is an exact phrase search. For larger
    values
    this
    works
    like
    a WITHIN or NEAR operator.

    The slop is in fact an edit-distance, where the units
    correspond
    to
    moves
    of
    terms in the query phrase out of position. For example, to
    switch
    the
    order
    of two words requires two moves (the first move places the
    words
    atop
    one
    another), so to permit re-orderings of phrases, the slop
    must
    be
    at
    least
    two.

    More exact matches are scored higher than sloppier
    matches,
    thus
    search
    results are sorted by exactness.

    The slop is zero by default, requiring exact matches.


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra


    --
    Thanks & Regards
    Tarun Sapra

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 27, '10 at 2:12p
activeJun 30, '10 at 1:09a
posts11
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase