FAQ
What I need is the following :
If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).

Given the following :
I should get a hit even if all of the search tokens aren't present
If the tokens are found they should be found within a distance x of
each other ( proximity
search)
I need the percentage match of the search tokens with the document field.

Currently this is my query :
1) I form all possible permutation of the search tokens
2) do a spanNearQuery of each permutation
3) Do a DisjunctionMaxQuery on the spannearqueries.

This is how I compute % match :
% match = ( Score by running the query on the document field ) /
( score by running the query on a document field created out of search tokens )

The numerator gives me the actual score with the search tokens run on the field.
Denominator gives me the best possible or maximum possible score with the current search tokens
For this example << If my document field is ( ab,bc,cd,ef) and Search tokens are
(ab,bc,cd).>> I expect a % match of around 90%.
However I get a match of only around 50% without a boost. Using a boost infact reduces
my percentage.
I even overrode the queryNorm method to return a one, still the percentage did not increase.
*
Is there any way of implementing this using the current set of
implementation classes in Lucene and not making complex changes to the
structure by itself.
( which is what i gather has to be done from the previous replies)

Can anyone suggest an alternative way of implementing this requirement
using the existing bunch of classes in Lucene and not necessarily
using the ones I have used*

Search Discussions

  • Radha Sreedharan at Apr 19, 2009 at 12:31 pm
    What I need is the following :
    If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).

    Given the following :
    I should get a hit even if all of the search tokens aren't present

    If the tokens are found they should be found within a distance x of
    each other ( proximity
    search)
    I need the percentage match of the search tokens with the document field.

    Currently this is my query :
    1) I form all possible permutation of the search tokens
    2) do a spanNearQuery of each permutation
    3) Do a DisjunctionMaxQuery on the spannearqueries.

    This is how I compute % match :
    % match = ( Score by running the query on the document field ) /
    ( score by running the query on a document field created out of search tokens )

    The numerator gives me the actual score with the search tokens run on the field.
    Denominator gives me the best possible or maximum possible score with the current search tokens
    For this example << If my document field is ( ab,bc,cd,ef) and Search tokens are
    (ab,bc,cd).>> I expect a % match of around 90%.
    However I get a match of only around 50% without a boost. Using a boost infact reduces
    my percentage.
    I even overrode the queryNorm method to return a one, still the percentage did not increase.
    *
    Is there any way of implementing this using the current set of
    implementation classes in Lucene and not making complex changes to the
    structure by itself.
    ( which is what i gather has to be done from the previous replies)

    Can anyone suggest an alternative way of implementing this requirement
    using the existing bunch of classes in Lucene and not necessarily
    using the ones I have used*
  • Rads2029 at Apr 21, 2009 at 2:30 pm
    Hi all,

    does anybody have a solution to the below query?

    regards,
    radha

    Rads2029 wrote:
    What I need is the following :
    If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).

    Given the following :
    I should get a hit even if all of the search tokens aren't present
    If the tokens are found they should be found within a distance x of
    each other ( proximity
    search)
    I need the percentage match of the search tokens with the document field.

    Currently this is my query :
    1) I form all possible permutation of the search tokens
    2) do a spanNearQuery of each permutation
    3) Do a DisjunctionMaxQuery on the spannearqueries.

    This is how I compute % match :
    % match = ( Score by running the query on the document field ) /
    ( score by running the query on a document field created out of search
    tokens )

    The numerator gives me the actual score with the search tokens run on the
    field.
    Denominator gives me the best possible or maximum possible score with
    the current search tokens
    For this example << If my document field is ( ab,bc,cd,ef) and Search
    tokens are
    (ab,bc,cd).>> I expect a % match of around 90%.
    However I get a match of only around 50% without a boost. Using a boost
    infact reduces
    my percentage.
    I even overrode the queryNorm method to return a one, still the
    percentage did not increase.
    *
    Is there any way of implementing this using the current set of
    implementation classes in Lucene and not making complex changes to the
    structure by itself.
    ( which is what i gather has to be done from the previous replies)

    Can anyone suggest an alternative way of implementing this requirement
    using the existing bunch of classes in Lucene and not necessarily
    using the ones I have used*
    --
    View this message in context: http://www.nabble.com/Proximity-and-Percentage-match-search-in-Lucene-tp23122481p23157398.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Radha Sreedharan at Apr 25, 2009 at 9:48 am

    What I need is the following :
    If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).

    Given the following :
    I should get a hit even if all of the search tokens aren't present
    If the tokens are found they should be found within a distance x of
    each other ( proximity
    search)
    I need the percentage match of the search tokens with the document field.

    Currently this is my query :
    1) I form all possible permutation of the search tokens
    2) do a spanNearQuery of each permutation
    3) Do a DisjunctionMaxQuery on the spannearqueries.

    This is how I compute % match :
    % match = ( Score by running the query on the document field ) /
    ( score by running the query on a document field created out of search
    tokens )

    The numerator gives me the actual score with the search tokens run on the
    field.
    Denominator gives me the best possible or maximum possible score with
    the current search tokens
    For this example << If my document field is ( ab,bc,cd,ef) and Search
    tokens are
    (ab,bc,cd).>> I expect a % match of around 90%.
    However I get a match of only around 50% without a boost. Using a boost
    infact reduces
    my percentage.
    I even overrode the queryNorm method to return a one, still the
    percentage did not increase.
    *
    Is there any way of implementing this using the current set of
    implementation classes in Lucene and not making complex changes to the
    structure by itself.
    ( which is what i gather has to be done from the previous replies)

    Can anyone suggest an alternative way of implementing this requirement
    using the existing bunch of classes in Lucene and not necessarily
    using the ones I have used*
    Regards,
    Radha

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Hostetter at Apr 28, 2009 at 10:44 pm
    Radha: replying/reforwarding the same message over and over doesn't tend
    to be a useful way to encourage additional replies. if you do have
    something to add to an existing discussion that you've started, you should
    at least do it as a reply to the orriginal discussion so people have the
    full context...
    http://www.nabble.com/Need-help-%3A-SpanNearQuery-to23077372.html

    I'm really not sure that anyone has any new additional info to offer you.
    this is a particularly hard problem, that doesn't have a very efficient
    solution that i know of. it sounds like you've already tried the obvious
    solution, but aren't happy with the scores produced -- if you can't get
    the scores you want by tweaking the Similarity options available, then
    implementing custom Query/Scorer classes is really the only remaining
    option.


    : Date: Sun, 19 Apr 2009 17:52:27 +0530
    : From: Radha Sreedharan <radha84@gmail.com>
    : Reply-To: java-user@lucene.apache.org
    : To: java-user@lucene.apache.org
    : Subject: Proximity and Percentage match search in Lucene
    :
    : What I need is the following :
    : If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).
    :
    : Given the following :
    : I should get a hit even if all of the search tokens aren't present
    : If the tokens are found they should be found within a distance x of
    : each other ( proximity
    : search)
    : >
    : > I need the percentage match of the search tokens with the document field.
    : >
    : > Currently this is my query :
    : > 1) I form all possible permutation of the search tokens
    : > 2) do a spanNearQuery of each permutation
    : > 3) Do a DisjunctionMaxQuery on the spannearqueries.
    : >
    : > This is how I compute % match :
    : > % match = ( Score by running the query on the document field ) /
    : > ( score by running the query on a document field created out of search tokens )
    : >
    : > The numerator gives me the actual score with the search tokens run on the field.
    : > Denominator gives me the best possible or maximum possible score with the current search
    : tokens
    : >
    : > For this example << If my document field is ( ab,bc,cd,ef) and Search tokens are
    : (ab,bc,cd).>> I expect a % match of around 90%.
    : >
    : > However I get a match of only around 50% without a boost. Using a boost infact reduces
    : my percentage.
    : >
    : > I even overrode the queryNorm method to return a one, still the percentage did not increase.
    : *
    : Is there any way of implementing this using the current set of
    : implementation classes in Lucene and not making complex changes to the
    : structure by itself.
    : ( which is what i gather has to be done from the previous replies)
    :
    : Can anyone suggest an alternative way of implementing this requirement
    : using the existing bunch of classes in Lucene and not necessarily
    : using the ones I have used*
    :



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 19, '09 at 12:22p
activeApr 28, '09 at 10:44p
posts5
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase