Grokbase Groups Lucene dev March 2011
FAQ
[ https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-2026:
------------------------------------

Labels: gsoc2011 lucene-gsoc-11 (was: )
Refactoring of IndexWriter
--------------------------

Key: LUCENE-2026
URL: https://issues.apache.org/jira/browse/LUCENE-2026
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Labels: gsoc2011, lucene-gsoc-11
Fix For: 4.0


I've been thinking for a while about refactoring the IndexWriter into
two main components.
One could be called a SegmentWriter and as the
name says its job would be to write one particular index segment. The
default one just as today will provide methods to add documents and
flushes when its buffer is full.
Other SegmentWriter implementations would do things like e.g. appending or
copying external segments [what addIndexes*() currently does].
The second component's job would it be to manage writing the segments
file and merging/deleting segments. It would know about
DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
provide hooks that allow users to manage external data structures and
keep them in sync with Lucene's data during segment merges.
API wise there are things we have to figure out, such as where the
updateDocument() method would fit in, because its deletion part
affects all segments, whereas the new document is only being added to
the new segment.
Of course these should be lower level APIs for things like parallel
indexing and related use cases. That's why we should still provide
easy to use APIs like today for people who don't need to care about
per-segment ops during indexing. So the current IndexWriter could
probably keeps most of its APIs and delegate to the new classes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Simon Willnauer (JIRA) at Mar 14, 2011 at 2:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Simon Willnauer updated LUCENE-2026:
    ------------------------------------

    Labels: mentor (was: gsoc2011 lucene-gsoc-11)
    Refactoring of IndexWriter
    --------------------------

    Key: LUCENE-2026
    URL: https://issues.apache.org/jira/browse/LUCENE-2026
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Index
    Reporter: Michael Busch
    Assignee: Michael Busch
    Priority: Minor
    Labels: mentor
    Fix For: 4.0


    I've been thinking for a while about refactoring the IndexWriter into
    two main components.
    One could be called a SegmentWriter and as the
    name says its job would be to write one particular index segment. The
    default one just as today will provide methods to add documents and
    flushes when its buffer is full.
    Other SegmentWriter implementations would do things like e.g. appending or
    copying external segments [what addIndexes*() currently does].
    The second component's job would it be to manage writing the segments
    file and merging/deleting segments. It would know about
    DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
    provide hooks that allow users to manage external data structures and
    keep them in sync with Lucene's data during segment merges.
    API wise there are things we have to figure out, such as where the
    updateDocument() method would fit in, because its deletion part
    affects all segments, whereas the new document is only being added to
    the new segment.
    Of course these should be lower level APIs for things like parallel
    indexing and related use cases. That's why we should still provide
    easy to use APIs like today for people who don't need to care about
    per-segment ops during indexing. So the current IndexWriter could
    probably keeps most of its APIs and delegate to the new classes.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (JIRA) at Mar 14, 2011 at 2:39 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Simon Willnauer updated LUCENE-2026:
    ------------------------------------

    Labels: gsoc2011, lucene-gsoc-11 mentor, (was: mentor)
    Refactoring of IndexWriter
    --------------------------

    Key: LUCENE-2026
    URL: https://issues.apache.org/jira/browse/LUCENE-2026
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Index
    Reporter: Michael Busch
    Assignee: Michael Busch
    Priority: Minor
    Labels: gsoc2011,, lucene-gsoc-11, mentor,
    Fix For: 4.0


    I've been thinking for a while about refactoring the IndexWriter into
    two main components.
    One could be called a SegmentWriter and as the
    name says its job would be to write one particular index segment. The
    default one just as today will provide methods to add documents and
    flushes when its buffer is full.
    Other SegmentWriter implementations would do things like e.g. appending or
    copying external segments [what addIndexes*() currently does].
    The second component's job would it be to manage writing the segments
    file and merging/deleting segments. It would know about
    DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
    provide hooks that allow users to manage external data structures and
    keep them in sync with Lucene's data during segment merges.
    API wise there are things we have to figure out, such as where the
    updateDocument() method would fit in, because its deletion part
    affects all segments, whereas the new document is only being added to
    the new segment.
    Of course these should be lower level APIs for things like parallel
    indexing and related use cases. That's why we should still provide
    easy to use APIs like today for people who don't need to care about
    per-segment ops during indexing. So the current IndexWriter could
    probably keeps most of its APIs and delegate to the new classes.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (JIRA) at Mar 14, 2011 at 5:15 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Simon Willnauer updated LUCENE-2026:
    ------------------------------------

    Labels: gsoc2011 lucene-gsoc-11 mentor (was: gsoc2011, lucene-gsoc-11 mentor,)
    Refactoring of IndexWriter
    --------------------------

    Key: LUCENE-2026
    URL: https://issues.apache.org/jira/browse/LUCENE-2026
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Index
    Reporter: Michael Busch
    Assignee: Michael Busch
    Priority: Minor
    Labels: gsoc2011, lucene-gsoc-11, mentor
    Fix For: 4.0


    I've been thinking for a while about refactoring the IndexWriter into
    two main components.
    One could be called a SegmentWriter and as the
    name says its job would be to write one particular index segment. The
    default one just as today will provide methods to add documents and
    flushes when its buffer is full.
    Other SegmentWriter implementations would do things like e.g. appending or
    copying external segments [what addIndexes*() currently does].
    The second component's job would it be to manage writing the segments
    file and merging/deleting segments. It would know about
    DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
    provide hooks that allow users to manage external data structures and
    keep them in sync with Lucene's data during segment merges.
    API wise there are things we have to figure out, such as where the
    updateDocument() method would fit in, because its deletion part
    affects all segments, whereas the new document is only being added to
    the new segment.
    Of course these should be lower level APIs for things like parallel
    indexing and related use cases. That's why we should still provide
    easy to use APIs like today for people who don't need to care about
    per-segment ops during indexing. So the current IndexWriter could
    probably keeps most of its APIs and delegate to the new classes.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMar 9, '11 at 9:22p
activeMar 14, '11 at 5:15p
posts4
users1
websitelucene.apache.org

1 user in discussion

Simon Willnauer (JIRA): 4 posts

People

Translate

site design / logo © 2021 Grokbase