It sounds like you'll have to build logic that knows when the index has been reopened and repopulates your cache. Take a look at Solr, it does this type of stuff.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/
- Tag - Search - Share
----- Original Message ----
From: Carlos Pita <firstname.lastname@example.org>
Sent: Thursday, May 24, 2007 12:50:04 PM
Subject: Re: HitCollector or Hits
I don't think that FieldSelector would be that valuable in my case because I
just need to access a few fields, and those are all fields that are in fact
stored (and indexed too). I was thinking of keeping this extra information
in memory, precisely into an array mapping doc ids to the data structure. I
see that this is done for ScoreDocComparator in a Lucene in Action example.
I'm still not sure how to achieve something similar with a HitCollector. I
mean, I could instantiate a maxDoc() size array and index it by the document
ids that are passed to the collector. But that said, I don't know how to
keep this array synchronized with the index. I've opened a new thread for
this subject, "maxDoc and arrays".
Thank you again.
On 5/24/07, Erick Erickson wrote:
You're on the right track. But that said, access to anything that's
indexed (stored or not) should be pretty quick. Things
stored, but not indexed, are costlier. This might drive your
decision on what to index .vs. store.....
Loading the document is anything like IndexReader.document(), or
Part of the difference is that if you load the document, you get
all the fields, whether you need them or not.
Also, you can use your own TermEnum/TermDocs lookup for
this kind of thing if the terms you're interested in are indexed...
I wrote a mail some time ago that detailed my experience, in my
situation with my peculiar data set that you may want to read,
Lucene 2.1, using FieldSelector speeds up my app by a factor of 10+,
As I mentioned in that message, I suspect that my improvement was
*highly* dependent upon how the index is structured.....
All that said, your notion of benchmarking is a very good one. It lead
me to using FieldSelector in the first place...
On 5/24/07, Carlos Pita wrote:
thank you for your prompt answer. What do you mean by loading the
Accessing one of the stored fields? In that case I'm afraid I would need
do it. For example, in the aforementioned case of a result of products, I
have to look at any product store_id, which is stored along the document.
this a performance killer? Maybe I should keep some tables in memory, for
example an array mapping from id to store_id in O(1). I will do some
benchmarking before anyway.
On 5/24/07, Erick Erickson wrote:
I know of no way to alter the Hits behavior, I recommend using
But be aware that if you load the document for each one, you may incur
a significant penalty, although the lazy-loading helped me a lot, see
On 5/23/07, Carlos Pita wrote:
I need to collect some global information from my first 1000 search
in order to build up some search refining components containing only
relevant values (those which correspond to at least one of the first 1000
hits). For example, the results are products and there is a store
component that shows only the stores that sells a product between
1000 hits. So even if the user sees just the first 20, I would have
inspect the first 1000. I've read that Hits mantains a cache of
200 hits. Is this configurable? If I could set this cache to 1000 I would
then use Hits to browse the search results. Another way, I should
HitCollector. What's your advice?
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org