FAQ
[ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601504#action_12601504 ]

Paul Elschot commented on LUCENE-1296:
--------------------------------------

I tried to come up with a sensible performance test to determine a good criterium to choose between OpenBitSet and SortedVIntList as the DocIdSet supporting data structure to be cached.
There is a criterium for this in the patch in docIdSetToCache() method of CachingWrapperFilter, but it's only based on byte size, and it favours SortedVIntList when it is defenitely more compact than OpenBitSet.

The current criterium is to use (cardinality (=nr bits set in OpenBitSet) < maxDocs/9) as a test to prefer SortedVIntList over OpenBitSet for caching. The constant 9 might be replaced by a configuration parameter to allow easy performance experiments there. It could be that a larger value than 9 is turns out to be "optimal" in runtime.

In some cases OpenBitSet can be faster on skipTo(int docNum) than SortedVIntList, even when SortedVIntList is more compact. As Filters can be expected to use skipTo() heavily, this could be important for performance.

Even even though it might be possible to measure the skipTo() performance directly, the effect of the more compact cached data structure of SortedVIntList on garbage collection is (pretty close to) impossible to measure in a simple test case.

Eks Dev had some interesting results there in the very early stages of LUCENE-584 (September 2006), so I wonder whether these results could be confirmed somehow using the patch here and the current trunk.

Comments?



Allow use of compact DocIdSet in CachingWrapperFilter
-----------------------------------------------------

Key: LUCENE-1296
URL: https://issues.apache.org/jira/browse/LUCENE-1296
Project: Lucene - Java
Issue Type: New Feature
Components: Search
Reporter: Paul Elschot
Assignee: Michael Busch
Priority: Minor
Attachments: cachedFilter20080529.patch


Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Paul Elschot (JIRA) at Jun 2, 2008 at 6:22 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601699#action_12601699 ]

    Paul Elschot commented on LUCENE-1296:
    --------------------------------------

    For the record: the patch of 20080529 leaves some imports of SortedVIntList unused.
    Allow use of compact DocIdSet in CachingWrapperFilter
    -----------------------------------------------------

    Key: LUCENE-1296
    URL: https://issues.apache.org/jira/browse/LUCENE-1296
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Search
    Reporter: Paul Elschot
    Assignee: Michael Busch
    Priority: Minor
    Attachments: cachedFilter20080529.patch


    Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jun 3, 2008 at 7:13 pm
    [ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600907#action_12600907 ]

    paul.elschot@xs4all.nl edited comment on LUCENE-1296 at 6/3/08 12:11 PM:
    ---------------------------------------------------------------

    The 20080529 patch patches CachingWrapperFilter and its test to add a choice of a compact filter to be cached, as well as some recently patched ( LUCENE-1187 ) contrib filter classes to remove the corresponding functionality there.


    was (Author: paul.elschot@xs4all.nl):
    The 20080529 patch patches CachingWrapperFilter and its test to add a choice of a compact filter to be cached, as well as some recently patched contrib filter classes to remove the corresponding functionality there.

    Allow use of compact DocIdSet in CachingWrapperFilter
    -----------------------------------------------------

    Key: LUCENE-1296
    URL: https://issues.apache.org/jira/browse/LUCENE-1296
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Search
    Reporter: Paul Elschot
    Assignee: Michael Busch
    Priority: Minor
    Attachments: cachedFilter20080529.patch


    Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jun 5, 2008 at 7:26 am
    [ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Elschot updated LUCENE-1296:
    ---------------------------------

    Attachment: cachedFilter20080605.patch

    In the 20080605 patch the docIdSetToCache method simply returns its argument, which would normally be an OpenBitSet when using a Filter from the core. Anyone who wants to have another filter data structure cached can override this method.
    Allow use of compact DocIdSet in CachingWrapperFilter
    -----------------------------------------------------

    Key: LUCENE-1296
    URL: https://issues.apache.org/jira/browse/LUCENE-1296
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Search
    Reporter: Paul Elschot
    Assignee: Michael Busch
    Priority: Minor
    Attachments: cachedFilter20080529.patch, cachedFilter20080605.patch


    Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jun 5, 2008 at 7:28 am
    [ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Elschot updated LUCENE-1296:
    ---------------------------------

    Attachment: cachedFilter20080605.patch

    Once more, with licence granted to ASF.
    Allow use of compact DocIdSet in CachingWrapperFilter
    -----------------------------------------------------

    Key: LUCENE-1296
    URL: https://issues.apache.org/jira/browse/LUCENE-1296
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Search
    Reporter: Paul Elschot
    Assignee: Michael Busch
    Priority: Minor
    Attachments: cachedFilter20080529.patch, cachedFilter20080605.patch


    Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot (JIRA) at Jun 5, 2008 at 7:28 am
    [ https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Elschot updated LUCENE-1296:
    ---------------------------------

    Attachment: (was: cachedFilter20080605.patch)
    Allow use of compact DocIdSet in CachingWrapperFilter
    -----------------------------------------------------

    Key: LUCENE-1296
    URL: https://issues.apache.org/jira/browse/LUCENE-1296
    Project: Lucene - Java
    Issue Type: New Feature
    Components: Search
    Reporter: Paul Elschot
    Assignee: Michael Busch
    Priority: Minor
    Attachments: cachedFilter20080529.patch, cachedFilter20080605.patch


    Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be cached.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedJun 1, '08 at 10:52p
activeJun 5, '08 at 7:28a
posts6
users1
websitelucene.apache.org

1 user in discussion

Paul Elschot (JIRA): 6 posts

People

Translate

site design / logo © 2021 Grokbase