FAQ
Is there a (better) way that I can use to figure out which field in a document caused the document to be returned from a query? Currently, after I do a search across all of my fields and documents, I am researching on each document that had a hit, on each field individually, and keeping track of the scores.. The highest scoring field is the one that I credit with returning the document.

This is fine for a small index, with a small number of fields, but it definitely doesn't seem like the correct way to go about getting this information.

Any suggestions would be appreciated,

Thanks,

Dan




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Matt Tucker at May 27, 2003 at 6:56 pm
    Dan,

    I don't have an answer to this question, unfortunately, but just wanted
    to say that we'd really love to see a better API for this too. :)

    Regards,
    Matt

    Armbrust, Daniel C. wrote:
    Is there a (better) way that I can use to figure out which field in a document caused the document to be returned from a query? Currently, after I do a search across all of my fields and documents, I am researching on each document that had a hit, on each field individually, and keeping track of the scores.. The highest scoring field is the one that I credit with returning the document.

    This is fine for a small index, with a small number of fields, but it definitely doesn't seem like the correct way to go about getting this information.

    Any suggestions would be appreciated,

    Thanks,

    Dan




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Ype Kingma at May 27, 2003 at 6:58 pm
    Daniel,
    On Tuesday 27 May 2003 11:17, Armbrust, Daniel C. wrote:
    Is there a (better) way that I can use to figure out which field in a
    document caused the document to be returned from a query? Currently, after
    I do a search across all of my fields and documents, I am researching on
    each document that had a hit, on each field individually, and keeping track
    of the scores.. The highest scoring field is the one that I credit with
    returning the document.

    This is fine for a small index, with a small number of fields, but it
    definitely doesn't seem like the correct way to go about getting this
    information.

    Any suggestions would be appreciated,
    To save the researching on previous hits you have two options.

    You could make a separate index for each field, query all indexes
    with a MultiSearcher, and suppress repeated documents from
    other indexes with lower scores.
    You could also do use a single database, but with
    a separate query for each subfield.
    Either way the results can be collected in a single document
    collector object, keeping only the best hits. You'll
    have to keep a larger number of best hits first and later drop
    the lower scoring ones for the same source document
    in case you need to retrieve a stored field from each
    hit to determine the original source document.

    I have a very similar situation with some databases
    containing titles and abstracts, some abstracts only,
    some full text with or without abstract.

    The nice thing about the default ranking mechanism
    is that it works out pretty well: short titles can
    score rather high, then normally abstracts,
    followed by full text. This is close to the
    separate index for each field I suggested above.

    The alternative to repeat the query with different
    fields in a single index could actually be bit better
    because you need have to retrieve the stored field(s)
    for each document only once. However, it's not
    as flexible.

    Kind regards,
    Ype Kingma


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 27, '03 at 6:17p
activeMay 27, '03 at 6:58p
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase