FAQ
Hi,
when is it wise to replace a TermQuery with cached Filter (regarding
search performance). If TermQuery is used only to filter results based
on field value (it doesn't participate in scoring), is it alway wise to
replace it with filter? Is it only wise if Filter is cached (wrapped in
CachingWrapperFilter) and reused often?


Does it matter how many distinct values field has (which is related to
how many matches/results for one given/selected value is returned and
also with how many times same filter instance is reused)?

For example, what if filter for single value matches only 5% of docs,
should filter be used or is it better to use TermQuery?

What about if filter for single value matches 20%? or 50% or 75%

I have a question regarding caching performance/memory usage. Documents
have date&time indexed (as NumericField) with minute resolution and
there are few thousands unique date&time in index. On the search side
open ended range filter is used (NumericRangeFilter) with current time
as a parameter.

Now, is it wise to cache NumericRangeFilter here (reuse instance of
CachingWrapperFilter wrapping NumericRangeFilter) since it will not be
reused often (only from users searching at same time in same time zone)?

Is it better to use NumericRangeFilter or NumericRangeQuery in this
case?

Tomislav








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Otis Gospodnetic at May 9, 2010 at 4:24 pm
    I think others will have more thoughts on this, esp. for Numeric* questions... but I'll try answering...

    ----- Original Message ----
    From: Tomislav Poljak <tpoljak@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Fri, May 7, 2010 2:34:46 PM
    Subject: Filter vs. TermQuery performance

    Hi,
    when is it wise to replace a TermQuery with cached Filter
    (regarding search performance). If TermQuery is used only to filter results
    based on field value (it doesn't participate in scoring), is it alway wise
    to replace it with filter?
    Yes, assuming the filter will be reused. I think there is not a lot of value in using a filter (vs. just a regular query) if that filter will not be reused. This is why in Solr "fq"s (filtered queries) are cached in a special filter cache. I *think* the only other benefit of using a filter query vs., say, TermQuery, is that the former will not spend any time/CPU on computing the score for the filter part.
    Is it only wise if Filter is cached (wrapped in CachingWrapperFilter) and reused often?
    I think so. See above.
    Does it matter how many
    distinct values field has (which is related to how many matches/results for
    one given/selected value is returned and also with how many times same filter
    instance is reused)?
    I *think* it matters. I think the more docs a filter matches, the higher the benefit from reusing a filter.
    For example, what if filter for single value matches
    only 5% of docs, should filter be used or is it better to use TermQuery?
    What about if filter for single value matches 20%? or 50% or
    75%
    I'm not sure...
    I have a question regarding caching performance/memory usage.
    Documents have date&time indexed (as NumericField) with minute resolution
    and there are few thousands unique date&time in index. On the search
    side open ended range filter is used (NumericRangeFilter) with current
    time as a parameter.
    Now, is it wise to cache NumericRangeFilter here
    (reuse instance of CachingWrapperFilter wrapping NumericRangeFilter) since it
    will not be reused often (only from users searching at same time in same time
    zone)?
    If the cache hit rate is low, why waste memory on caching is what I would think is the logic to apply here.
    If you have 3 queries, and each uses a different date range query, then you will not see benefits from caching..
    If 2 of those 3 queries use the exact same date range query, then you will see caching benefits.
    Is it better to use NumericRangeFilter or NumericRangeQuery in this case?
    I'm not sure, but I'd be happy to add specific advice to Javadoc when the answer is clear.

    Otis

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 7, '10 at 6:35p
activeMay 9, '10 at 4:24p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase