FAQ
I think others will have more thoughts on this, esp. for Numeric* questions... but I'll try answering...

----- Original Message ----
From: Tomislav Poljak <tpoljak@gmail.com>
To: java-user@lucene.apache.org
Sent: Fri, May 7, 2010 2:34:46 PM
Subject: Filter vs. TermQuery performance

Hi,
when is it wise to replace a TermQuery with cached Filter
(regarding search performance). If TermQuery is used only to filter results
based on field value (it doesn't participate in scoring), is it alway wise
to replace it with filter?
Yes, assuming the filter will be reused. I think there is not a lot of value in using a filter (vs. just a regular query) if that filter will not be reused. This is why in Solr "fq"s (filtered queries) are cached in a special filter cache. I *think* the only other benefit of using a filter query vs., say, TermQuery, is that the former will not spend any time/CPU on computing the score for the filter part.
Is it only wise if Filter is cached (wrapped in CachingWrapperFilter) and reused often?
I think so. See above.
Does it matter how many
distinct values field has (which is related to how many matches/results for
one given/selected value is returned and also with how many times same filter
instance is reused)?
I *think* it matters. I think the more docs a filter matches, the higher the benefit from reusing a filter.
For example, what if filter for single value matches
only 5% of docs, should filter be used or is it better to use TermQuery?
What about if filter for single value matches 20%? or 50% or
75%
I'm not sure...
I have a question regarding caching performance/memory usage.
Documents have date&time indexed (as NumericField) with minute resolution
and there are few thousands unique date&time in index. On the search
side open ended range filter is used (NumericRangeFilter) with current
time as a parameter.
Now, is it wise to cache NumericRangeFilter here
(reuse instance of CachingWrapperFilter wrapping NumericRangeFilter) since it
will not be reused often (only from users searching at same time in same time
zone)?
If the cache hit rate is low, why waste memory on caching is what I would think is the logic to apply here.
If you have 3 queries, and each uses a different date range query, then you will not see benefits from caching..
If 2 of those 3 queries use the exact same date range query, then you will see caching benefits.
Is it better to use NumericRangeFilter or NumericRangeQuery in this case?
I'm not sure, but I'd be happy to add specific advice to Javadoc when the answer is clear.

Otis

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedMay 7, '10 at 6:35p
activeMay 9, '10 at 4:24p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase