Grokbase Groups Lucene dev June 2011
FAQ
Realtime terms dictionary
-------------------------

Key: LUCENE-3245
URL: https://issues.apache.org/jira/browse/LUCENE-3245
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor


For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage.

If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Jason Rutherglen (JIRA) at Jun 27, 2011 at 4:47 am
    [ https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jason Rutherglen updated LUCENE-3245:
    -------------------------------------

    Attachment: LUCENE-3245.patch

    Here's a basic initial patch implementing a single threaded writer, multiple reader atomic integer array skip list.

    The next step is to tie in the ByteBlockPool to store terms, eg, implement an RTTermsDictAIA class, and an RTTermsDictCSLM class.

    We can then load the same Wiki-EN terms, and measure the comparative write speeds.

    Then create a set of terms to lookup from each terms dict and measure the time difference.

    I am not yet sure how the speed of AtomicIntegerArray will compare with CSLM's usage of AtomicReferenceFieldUpdater. Of note is the fact that because of DWPTs we do not need a skip list that supports concurrent writes. And because we're only adding new unique terms, we do not need delete functionality. Ie, AIA could be faster, though we may need to inline code and perform various tuning tricks.
    Realtime terms dictionary
    -------------------------

    Key: LUCENE-3245
    URL: https://issues.apache.org/jira/browse/LUCENE-3245
    Project: Lucene - Java
    Issue Type: Improvement
    Components: core/index
    Affects Versions: 4.0
    Reporter: Jason Rutherglen
    Priority: Minor
    Attachments: LUCENE-3245.patch


    For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage.
    If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Jason Rutherglen (JIRA) at Jun 27, 2011 at 5:49 am
    [ https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jason Rutherglen updated LUCENE-3245:
    -------------------------------------

    Attachment: LUCENE-3245.patch

    Added and fixed the code that traverses the skip list to the level zero linked list and iterates.

    I need to reuse the starts int array, that's next.
    Realtime terms dictionary
    -------------------------

    Key: LUCENE-3245
    URL: https://issues.apache.org/jira/browse/LUCENE-3245
    Project: Lucene - Java
    Issue Type: Improvement
    Components: core/index
    Affects Versions: 4.0
    Reporter: Jason Rutherglen
    Priority: Minor
    Attachments: LUCENE-3245.patch, LUCENE-3245.patch


    For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage.
    If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Jason Rutherglen (JIRA) at Jun 27, 2011 at 7:22 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jason Rutherglen updated LUCENE-3245:
    -------------------------------------

    Attachment: LUCENE-3245.patch

    Here's a cut with a first implementation of the CSLM and AIA terms dictionaries.

    I think we're ready to benchmark writes.
    Realtime terms dictionary
    -------------------------

    Key: LUCENE-3245
    URL: https://issues.apache.org/jira/browse/LUCENE-3245
    Project: Lucene - Java
    Issue Type: Improvement
    Components: core/index
    Affects Versions: 4.0
    Reporter: Jason Rutherglen
    Priority: Minor
    Attachments: LUCENE-3245.patch, LUCENE-3245.patch, LUCENE-3245.patch


    For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage.
    If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 27, '11 at 4:11a
activeJun 27, '11 at 7:22p
posts4
users1
websitelucene.apache.org

1 user in discussion

Jason Rutherglen (JIRA): 4 posts

People

Translate

site design / logo © 2021 Grokbase