Hello Lucene Experts,

I wonder if someone might be able to shed some insight on this interesting scoring question:

The problem:
Build a search query that will return [ordered] hits by the top number of occurences of field values across matched documents (or as close to this as possible).
The built-in scoring is great for scoring number of hits within a document, but is there an efficient way to do this across the same field in a set of matched documents? (maybe scoring isn't the best way?)

Let's say you have an index containing book information. Each document has a 'title' field.
Let's say the index contains 100 entries, with:
65 'title's containing the word 'tiger'
21 containing 'lion'
6 containing 'panther'
5 containing 'kitten'
3 containing 'slug'

What would be the best way to build a query such that returned documents are ordered in this way:
Rank Value Occurences
1 tiger 65
2 lion 21
3 panther 6
4 kitten 5
5 slug 3

I can, of course, build a standard query, traverse the returned documents and build such a list, but if the returned query had many 100,000's of hits, the performance would degrade linearly, particularly if only the 'Top 5' are actually required.

One idea is to maintain a separate index with this information - the main problem with this is that you essentially need to know what you're searching for at index-time, which isn't ideal.

Has anyone come across and solved this particular issue using Lucene?

Many thanks,

Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupjava-user @
postedNov 22, '09 at 4:43p
activeNov 22, '09 at 5:45p

2 users in discussion

Peter 4U: 2 posts Jake Mannix: 1 post



site design / logo © 2022 Grokbase