FAQ
Problem trying to merge indexes in the background whilst building some others, works okay on my humble labtop but fails on another machine, although it seems to allow 700,000 file handles

Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: /home/musicbrainz/search_server/data/recording_index/_rs.cfs (Too many open files)
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:347)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:312)
Caused by: java.io.FileNotFoundException: /home/robert/musicbrainz/search_server/data/recording_index/_rs.cfs (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(SimpleFSDirectory.java:76)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(NIOFSDirectory.java:98)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
at org.apache.lucene.index.CompoundFileReader.(SegmentReader.java:115)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:605)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:622)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4394)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4000)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288)

Anyone got some ideas on how I can get to the bottom of this, Im using lucene 3.03

thank Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Michael McCandless at Apr 4, 2011 at 7:14 pm
    How are you merging these indices? (IW.addIndexes?).

    Are you changing any of IW's defaults, eg mergeFactor?

    Mike

    http://blog.mikemccandless.com
    On Mon, Apr 4, 2011 at 3:05 PM, Paul Taylor wrote:
    Problem trying to merge indexes in the background whilst building some
    others, works okay on my humble labtop but fails on another machine,
    although it seems to allow 700,000 file handles

    Exception in thread "Lucene Merge Thread #0"
    org.apache.lucene.index.MergePolicy$MergeException:
    java.io.FileNotFoundException:
    /home/musicbrainz/search_server/data/recording_index/_rs.cfs (Too many open
    files)
    at
    org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:347)
    at
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:312)
    Caused by: java.io.FileNotFoundException:
    /home/robert/musicbrainz/search_server/data/recording_index/_rs.cfs (Too
    many open files)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
    at
    org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:76)
    at
    org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:97)
    at
    org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:98)
    at
    org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
    at
    org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:67)
    at
    org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:115)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:605)
    at
    org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:622)
    at
    org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4394)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4000)
    at
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231)
    at
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288)

    Anyone got some ideas on how I can get to the bottom of this, Im using
    lucene 3.03

    thank Paul


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Paul Taylor at Apr 4, 2011 at 8:00 pm

    On 04/04/2011 20:13, Michael McCandless wrote:
    How are you merging these indices? (IW.addIndexes?).

    Are you changing any of IW's defaults, eg mergeFactor?

    Mike
    Hi Mike

    I have

    indexWriter.setMaxBufferedDocs(10000);
    indexWriter.setMergeFactor(3000);

    these are a hangover from earlier code, I tried changing them and it
    didnt seem to make any difference, but do they look wrong ?

    The index that falls over during optmizationcreates about 10,000,000 records

    What I do is build 10 different indexes sequentially one after the after
    (so only one is hitting the db at any one time) but once the index is
    built I optimize the index in the background
    by creating an instance of the following class and submitting it to an
    ExecutionService , configured to have at most 2 threads acting

    I built my own class rather than just using optimize() with the
    background option because that wouldnt allow me to do the necessary
    debugging/calculations


    static class IndexWriterOptimizerAndClose implements Callable<Boolean>
    {
    private int maxId;
    private IndexWriter indexWriter;
    private DatabaseIndex index;
    private IndexOptions options;

    /**
    *
    * @param maxId
    * @param indexWriter
    * @param index
    * @param options
    */
    public IndexWriterOptimizerAndClose(int maxId, IndexWriter
    indexWriter, DatabaseIndex index, IndexOptions options)
    {
    this.maxId=maxId;
    this.indexWriter= indexWriter;
    this.index=index;
    this.options=options;
    }

    public Boolean call() throws IOException, SQLException
    {

    StopWatch clock = new StopWatch();
    clock.start();
    String path = options.getIndexesDir() + index.getFilename();
    System.out.println(index.getName()+":Started Optimization
    at "+Utils.formatCurrentTimeForOutput());

    indexWriter.optimize();
    indexWriter.close();
    clock.stop();
    // For debugging to check sql is not creating too few/many rows
    if(true) {
    int dbRows = index.getNoOfRows(maxId);
    IndexReader reader =
    IndexReader.open(FSDirectory.open(new File(path)),true);
    System.out.println(index.getName()+":"+dbRows+" db
    rows:"+(reader.maxDoc() - 1)+" lucene docs");
    reader.close();
    }
    System.out.println(index.getName()+":Finished
    Optimization:" + Utils.formatClock(clock));

    return true;
    }
    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Simon Willnauer at Apr 4, 2011 at 8:07 pm

    On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylor wrote:
    On 04/04/2011 20:13, Michael McCandless wrote:

    How are you merging these indices?  (IW.addIndexes?).

    Are you changing any of IW's defaults, eg mergeFactor?

    Mike
    Hi Mike

    I have

    indexWriter.setMaxBufferedDocs(10000);
    indexWriter.setMergeFactor(3000);
    I didn't read though the entire email but MergeFactor 3000 doesn't
    look right at all. You should try something between 5 and 30. I
    haven't seen a mergeFactor > 50 doing any good in a common env. Why do
    you use such a large factor?

    simon
    these are a hangover from earlier code, I tried changing them and it didnt
    seem to make any difference, but do they look wrong ?

    The index that falls over during optmizationcreates about 10,000,000 records

    What I do is build 10 different indexes sequentially one after the after (so
    only one is hitting the db at any one time) but once the index is built I
    optimize the index in the background
    by creating an instance of the following class and submitting it to  an
    ExecutionService , configured to have at most 2 threads acting

    I built my own class rather than just using optimize() with the background
    option because that wouldnt allow me to do the necessary
    debugging/calculations


    static class IndexWriterOptimizerAndClose implements Callable<Boolean>
    {
    private int             maxId;
    private IndexWriter     indexWriter;
    private DatabaseIndex   index;
    private IndexOptions    options;

    /**
    *
    * @param maxId
    * @param indexWriter
    * @param index
    * @param options
    */
    public IndexWriterOptimizerAndClose(int maxId, IndexWriter
    indexWriter, DatabaseIndex index, IndexOptions options)
    {
    this.maxId=maxId;
    this.indexWriter= indexWriter;
    this.index=index;
    this.options=options;
    }

    public Boolean call() throws IOException, SQLException
    {

    StopWatch clock = new StopWatch();
    clock.start();
    String path = options.getIndexesDir() + index.getFilename();
    System.out.println(index.getName()+":Started Optimization at
    "+Utils.formatCurrentTimeForOutput());

    indexWriter.optimize();
    indexWriter.close();
    clock.stop();
    // For debugging to check sql is not creating too few/many rows
    if(true) {
    int dbRows = index.getNoOfRows(maxId);
    IndexReader reader = IndexReader.open(FSDirectory.open(new
    File(path)),true);
    System.out.println(index.getName()+":"+dbRows+" db
    rows:"+(reader.maxDoc() - 1)+" lucene docs");
    reader.close();
    }
    System.out.println(index.getName()+":Finished Optimization:" +
    Utils.formatClock(clock));

    return true;
    }
    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Apr 5, 2011 at 9:46 am
    Yeah, that mergeFactor is way too high and will cause
    too-many-open-files (if the index has enough segments).

    Also, you should setRamBufferSizeMB instead of maxBufferedDocs, for
    faster index throughput.

    Calling optimize from two threads doesn't help it run faster when
    using ConcurrentMergeScheduler (the default). Ie with CMS, optimize
    simply waits for CMS to perform all the necessary merges.

    Mike

    http://blog.mikemccandless.com

    On Mon, Apr 4, 2011 at 4:06 PM, Simon Willnauer
    wrote:
    On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylor wrote:
    On 04/04/2011 20:13, Michael McCandless wrote:

    How are you merging these indices?  (IW.addIndexes?).

    Are you changing any of IW's defaults, eg mergeFactor?

    Mike
    Hi Mike

    I have

    indexWriter.setMaxBufferedDocs(10000);
    indexWriter.setMergeFactor(3000);
    I didn't read though the entire email but MergeFactor 3000 doesn't
    look right at all. You should try something between 5 and 30. I
    haven't seen a mergeFactor > 50 doing any good in a common env. Why do
    you use such a large factor?

    simon
    these are a hangover from earlier code, I tried changing them and it didnt
    seem to make any difference, but do they look wrong ?

    The index that falls over during optmizationcreates about 10,000,000 records

    What I do is build 10 different indexes sequentially one after the after (so
    only one is hitting the db at any one time) but once the index is built I
    optimize the index in the background
    by creating an instance of the following class and submitting it to  an
    ExecutionService , configured to have at most 2 threads acting

    I built my own class rather than just using optimize() with the background
    option because that wouldnt allow me to do the necessary
    debugging/calculations


    static class IndexWriterOptimizerAndClose implements Callable<Boolean>
    {
    private int             maxId;
    private IndexWriter     indexWriter;
    private DatabaseIndex   index;
    private IndexOptions    options;

    /**
    *
    * @param maxId
    * @param indexWriter
    * @param index
    * @param options
    */
    public IndexWriterOptimizerAndClose(int maxId, IndexWriter
    indexWriter, DatabaseIndex index, IndexOptions options)
    {
    this.maxId=maxId;
    this.indexWriter= indexWriter;
    this.index=index;
    this.options=options;
    }

    public Boolean call() throws IOException, SQLException
    {

    StopWatch clock = new StopWatch();
    clock.start();
    String path = options.getIndexesDir() + index.getFilename();
    System.out.println(index.getName()+":Started Optimization at
    "+Utils.formatCurrentTimeForOutput());

    indexWriter.optimize();
    indexWriter.close();
    clock.stop();
    // For debugging to check sql is not creating too few/many rows
    if(true) {
    int dbRows = index.getNoOfRows(maxId);
    IndexReader reader = IndexReader.open(FSDirectory.open(new
    File(path)),true);
    System.out.println(index.getName()+":"+dbRows+" db
    rows:"+(reader.maxDoc() - 1)+" lucene docs");
    reader.close();
    }
    System.out.println(index.getName()+":Finished Optimization:" +
    Utils.formatClock(clock));

    return true;
    }
    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Paul Taylor at Apr 6, 2011 at 8:51 am

    On 04/04/2011 21:06, Simon Willnauer wrote:
    On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylorwrote:
    On 04/04/2011 20:13, Michael McCandless wrote:
    How are you merging these indices? (IW.addIndexes?).

    Are you changing any of IW's defaults, eg mergeFactor?

    Mike
    Hi Mike

    I have

    indexWriter.setMaxBufferedDocs(10000);
    indexWriter.setMergeFactor(3000);
    I didn't read though the entire email but MergeFactor 3000 doesn't
    look right at all. You should try something between 5 and 30. I
    haven't seen a mergeFactor> 50 doing any good in a common env. Why do
    you use such a large factor?

    simon
    decreased merge factor to 10, and it still fails
    Is there way I can find out how many open file handles lucene will need ?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Apr 6, 2011 at 10:45 am
    Can you turn on IndexWriter's infoStream, get the failure to happen,
    and post the resulting output?

    How are you adding the multiple indices together? Can you post the
    code that does that?

    The number of open file handles needed during indexing is a function
    of how many merges are running and how large (how many segments) each
    of those merges are. Lucene in Action 2, chapter 11, has a cool graph
    showing this... (and also file handle usage by IndexReader, reopening
    periodically).

    Mike

    http://blog.mikemccandless.com
    On Wed, Apr 6, 2011 at 4:50 AM, Paul Taylor wrote:
    On 04/04/2011 21:06, Simon Willnauer wrote:

    On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylorwrote:
    On 04/04/2011 20:13, Michael McCandless wrote:

    How are you merging these indices?  (IW.addIndexes?).

    Are you changing any of IW's defaults, eg mergeFactor?

    Mike
    Hi Mike

    I have

    indexWriter.setMaxBufferedDocs(10000);
    indexWriter.setMergeFactor(3000);
    I didn't read though the entire email but MergeFactor 3000 doesn't
    look right at all. You should try something between 5 and 30. I
    haven't seen a mergeFactor>  50 doing any good in a common env. Why do
    you use such a large factor?

    simon
    decreased merge factor to 10, and it still fails
    Is there way I can find out how many open file handles lucene will need ?
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 4, '11 at 7:06p
activeApr 6, '11 at 10:45a
posts7
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase