FAQ
Daniel,
On Tuesday 27 May 2003 11:17, Armbrust, Daniel C. wrote:
Is there a (better) way that I can use to figure out which field in a
document caused the document to be returned from a query? Currently, after
I do a search across all of my fields and documents, I am researching on
each document that had a hit, on each field individually, and keeping track
of the scores.. The highest scoring field is the one that I credit with
returning the document.

This is fine for a small index, with a small number of fields, but it
definitely doesn't seem like the correct way to go about getting this
information.

Any suggestions would be appreciated,
To save the researching on previous hits you have two options.

You could make a separate index for each field, query all indexes
with a MultiSearcher, and suppress repeated documents from
other indexes with lower scores.
You could also do use a single database, but with
a separate query for each subfield.
Either way the results can be collected in a single document
collector object, keeping only the best hits. You'll
have to keep a larger number of best hits first and later drop
the lower scoring ones for the same source document
in case you need to retrieve a stored field from each
hit to determine the original source document.

I have a very similar situation with some databases
containing titles and abstracts, some abstracts only,
some full text with or without abstract.

The nice thing about the default ranking mechanism
is that it works out pretty well: short titles can
score rather high, then normally abstracts,
followed by full text. This is close to the
separate index for each field I suggested above.

The alternative to repeat the query with different
fields in a single index could actually be bit better
because you need have to retrieve the stored field(s)
for each document only once. However, it's not
as flexible.

Kind regards,
Ype Kingma


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedMay 27, '03 at 6:17p
activeMay 27, '03 at 6:58p
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase