Grokbase Groups Lucene dev June 2011
FAQ
improved compound file handling
-------------------------------

Key: LUCENE-3201
URL: https://issues.apache.org/jira/browse/LUCENE-3201
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir


Currently CompoundFileReader could use some improvements, i see the following problems
* its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
* it seeks on every readInternal
* its not possible for a directory to override or improve the handling of compound files.

for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
as a user could read into the next file and be left unaware.

however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
as its position would just work.

So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
case for the least code change would be to add this to Directory.java:

{code}
public Directory openCompoundInput(String filename) {
return new CompoundFileReader(this, filename);
}
{code}

Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Michael McCandless (JIRA) at Jun 14, 2011 at 12:03 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048906#comment-13048906 ]

    Michael McCandless commented on LUCENE-3201:
    --------------------------------------------

    +1
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir

    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 14, 2011 at 12:13 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048912#comment-13048912 ]

    Robert Muir commented on LUCENE-3201:
    -------------------------------------

    I think for this one, I prefer to wait for Uwe's refactoring of MMap on LUCENE-3200.
    Then mmap is simpler, and i think we can even use the same indexinput implementation here.

    This would mean no slowdown when searching CFS.

    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir

    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 14, 2011 at 5:14 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3201:
    --------------------------------

    Attachment: LUCENE-3201.patch

    Initial patch for review. In this patch I only cut over MMapDirectory to using a special CompoundFileDirectory, all others use the default as before (but i cleaned up some things about it).

    Pretty sure i can easily improve SimpleFS and NIOFS, i'll take a look at that now, but I wanted to get this up for review.

    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Attachments: LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Jun 14, 2011 at 5:20 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049287#comment-13049287 ]

    Michael McCandless commented on LUCENE-3201:
    --------------------------------------------

    Patch looks great! Incredible that this means there's no penalty at all at search time when using CFS, if you use MMapDir.

    I like that CFS reader is now under oal.store not .index.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Attachments: LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 14, 2011 at 5:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3201:
    --------------------------------

    Fix Version/s: 4.0
    3.3

    setting 3.3/4.0 as fix version, as the changes are backwards compatible (compoundfilereader is pkg-private still in 3.x)

    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jun 14, 2011 at 6:57 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049335#comment-13049335 ]

    Uwe Schindler commented on LUCENE-3201:
    ---------------------------------------

    Hi Robert, great patch, exactly as I would have wished to have it when we discussed about it!

    Patch looks file, small bug:
    - FileSwitchDirectory should also override the openCompoundInput() from Directory and delegate to the correct underlying directory. Now it always uses the default impl, which is double buffering. So if you e.g. put MMapDirectory as a delegate for CFS files, those files would be opened like before your patch. Just copy'n'paste the code from one of the other FileSwitchDirectory methods.

    Some suggestions:
    We currently map the whole compound file into address space, read the header/contents and unmap it again. This may be some overhead especially if unmapping is not supported.
    - We could use SimpleFSIndexInput to read CFS contents (we only need to pass the already open RAF there, alternatively use Dawids new wrapper IndexInput around a standard InputStream, got from RAF -> LUCENE-3202)
    - Only map the header of the CFS file, the problem: we dont know exact size.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 14, 2011 at 7:11 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049346#comment-13049346 ]

    Robert Muir commented on LUCENE-3201:
    -------------------------------------

    I agree, the fileswitchdirectory should delegate the openCompoundInput.

    As far as mapping small things, I think we should set this aside for another issue.
    as far as this issue goes, I don't mind returning the DefaultCompound impl if unmapping isn't supported, but i'd really rather defer the open the can of worms of 'mapping small things' to some other issue :)

    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jun 14, 2011 at 7:21 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049352#comment-13049352 ]

    Uwe Schindler commented on LUCENE-3201:
    ---------------------------------------

    We have LUCENE-1743 for the small files can of worms.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 14, 2011 at 9:43 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3201:
    --------------------------------

    Attachment: LUCENE-3201.patch

    here is an updated patch, including impls for SimpleFS and NIOFS, fixing the FileSwitchDirectory thing uwe mentioned, and also mockdirectorywrapper and NRTCachingDirectory.

    all the tests pass with Simple/NIO/MMap but we need to benchmark. haven't had good luck today with luceneutil
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jun 14, 2011 at 10:01 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460 ]

    Uwe Schindler commented on LUCENE-3201:
    ---------------------------------------

    Robert: Very nice. Small thing:

    - NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent.

    That's all for now, thanks for hard work!
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Jun 14, 2011 at 10:09 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460 ]

    Uwe Schindler edited comment on LUCENE-3201 at 6/14/11 10:07 PM:
    -----------------------------------------------------------------

    Robert: Very nice. Small thing:

    - NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory / MMapCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "*FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent.

    That's all for now, thanks for hard work!

    was (Author: thetaphi):
    Robert: Very nice. Small thing:

    - NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent.

    That's all for now, thanks for hard work!
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (JIRA) at Jun 20, 2011 at 7:01 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052156#comment-13052156 ]

    Simon Willnauer commented on LUCENE-3201:
    -----------------------------------------

    this seems ready to commit... I think we should get that in so I can take it further on LUCENE-3218

    Robert is it ok for you if I commit this or are you gonig to do it?

    simon
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 20, 2011 at 7:05 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052159#comment-13052159 ]

    Robert Muir commented on LUCENE-3201:
    -------------------------------------

    I didnt commit because I didn't measure any performance improvements from the patch (this frustrated me).
    Also, I didn't address Uwe's last comment...

    In general, I was thinking that this would be a good performance win, but it isn't. So we should consider it from a refactoring perspective only.

    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (JIRA) at Jun 21, 2011 at 3:56 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Simon Willnauer resolved LUCENE-3201.
    -------------------------------------

    Resolution: Fixed
    Assignee: Simon Willnauer

    incorporated in LUCENE-3218 I will track backporting there
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Fix For: 3.3, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Jun 26, 2011 at 11:07 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3201:
    --------------------------------

    Fix Version/s: (was: 3.3)
    3.4
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 20, 2011 at 9:11 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3201:
    --------------------------------

    Priority: Blocker (was: Major)
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 20, 2011 at 9:11 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir reopened LUCENE-3201:
    ---------------------------------


    reopening, like LUCENE-3218, I think we should pull this stuff back and revisit.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Aug 20, 2011 at 9:19 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088168#comment-13088168 ]

    Uwe Schindler commented on LUCENE-3201:
    ---------------------------------------

    During code review I found a problem in the MMap special handling regarding number of open files:

    The default CFS Reader opens one file handle for the CFS and then maps slices using CFIndexInput. On the other hand, MMap's CFS directory impl does a separate mapping for each slice. To map this slice, it opens a new file handle, mmaps the slice, and closes the file handle.

    The question is now: Will this file handle then be occupied until the mapping diappears? If this is the case, we could have TooManyOpenFiles even for CFS as each sub file would occupy one file handle. At least the MMap specific CFS reader should use the same RAF all the time time and keep it open for mapping.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (JIRA) at Aug 20, 2011 at 9:21 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088170#comment-13088170 ]

    Robert Muir commented on LUCENE-3201:
    -------------------------------------

    thats just not true... but illustrates my point that this stuff is complicated and I think we need to take the safe option here and back it out.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Aug 20, 2011 at 9:21 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088169#comment-13088169 ]

    Uwe Schindler commented on LUCENE-3201:
    ---------------------------------------

    My last comment was wrong, the impl was changed before commit so it reuses RAF.
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Aug 20, 2011 at 9:21 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-3201:
    ----------------------------------

    Comment: was deleted

    (was: During code review I found a problem in the MMap special handling regarding number of open files:

    The default CFS Reader opens one file handle for the CFS and then maps slices using CFIndexInput. On the other hand, MMap's CFS directory impl does a separate mapping for each slice. To map this slice, it opens a new file handle, mmaps the slice, and closes the file handle.

    The question is now: Will this file handle then be occupied until the mapping diappears? If this is the case, we could have TooManyOpenFiles even for CFS as each sub file would occupy one file handle. At least the MMap specific CFS reader should use the same RAF all the time time and keep it open for mapping.)
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Aug 20, 2011 at 9:27 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088172#comment-13088172 ]

    Uwe Schindler commented on LUCENE-3201:
    ---------------------------------------

    I reverted my comment :-)
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.4, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Sep 14, 2011 at 10:13 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Michael McCandless updated LUCENE-3201:
    ---------------------------------------

    Fix Version/s: (was: 3.4)
    3.5
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.5, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (Commented) (JIRA) at Oct 2, 2011 at 9:50 am
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118988#comment-13118988 ]

    Simon Willnauer commented on LUCENE-3201:
    -----------------------------------------

    I think we can close this issue unless we plan to backport the CFS changes to 3.x? Opinions?
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Blocker
    Fix For: 3.5, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Robert Muir (Updated) (JIRA) at Oct 7, 2011 at 7:35 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Muir updated LUCENE-3201:
    --------------------------------

    Priority: Minor (was: Blocker)

    not a blocker, it was pulled from 3.x (and fixed in trunk)
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Minor
    Fix For: 3.5, 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Simon Willnauer (Resolved) (JIRA) at Oct 7, 2011 at 9:58 pm
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Simon Willnauer resolved LUCENE-3201.
    -------------------------------------

    Resolution: Fixed
    Fix Version/s: (was: 3.5)

    I am closing this... if we feel like porting we can still reopen
    improved compound file handling
    -------------------------------

    Key: LUCENE-3201
    URL: https://issues.apache.org/jira/browse/LUCENE-3201
    Project: Lucene - Java
    Issue Type: Improvement
    Reporter: Robert Muir
    Assignee: Simon Willnauer
    Priority: Minor
    Fix For: 4.0

    Attachments: LUCENE-3201.patch, LUCENE-3201.patch


    Currently CompoundFileReader could use some improvements, i see the following problems
    * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
    * it seeks on every readInternal
    * its not possible for a directory to override or improve the handling of compound files.
    for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
    and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
    as a user could read into the next file and be left unaware.
    however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
    its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
    as its position would just work.
    So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest
    case for the least code change would be to add this to Directory.java:
    {code}
    public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
    }
    {code}
    Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
    but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 13, '11 at 11:59p
activeOct 7, '11 at 9:58p
posts27
users1
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase