FAQ
I read the lucene in action book and just tested the
FSversusRAMDirectoryTest.java with the following uncommented:



// /**

// // change to adjust performance of indexing with FSDirectory

writer.mergeFactor = 100;

writer.maxMergeDocs = 999999;

writer.minMergeDocs = 1000;

// */



Here is the output:

RAMDirectory Time: 805 ms

FSDirectory Time : 728 ms

Search Discussions

  • Michael McCandless at Jun 6, 2011 at 3:58 pm
    This test is very old (from the 1st edition of the book but removed
    from the 2nd).

    Modern OS's cache newly written files in RAM, and this test doesn't
    write very large files (I think?), so the test is really testing an
    OS's IO cache vs Lucene's RAM Dir.

    That said, I'm not sure why RAMDir would be slower... FSDir still must
    go through the OS APIs even if the OS then caches in RAM.

    Mike McCandless

    http://blog.mikemccandless.com

    2011/6/6 zhoucheng2008 <zhoucheng2008@gmail.com>:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:



    //    /**

    //    // change to adjust performance of indexing with FSDirectory

    writer.mergeFactor = 100;

    writer.maxMergeDocs = 999999;

    writer.minMergeDocs = 1000;

    //    */



    Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Jun 6, 2011 at 4:04 pm
    Hi,

    It depends on the Lucene version, so if the test uses latest Lucene on a
    64bit OS, it may use MMapDirectory internally (returned on
    FSDirectors.open()) - then its comparing the same with the same - reading
    from ram memory :-)

    Maybe the difference is also caused by not warming hotspot's compiler. Those
    times are only ok, if you repeat the same code path quite often and only use
    the recent results and not the ones from the first iterations 8AS Java's
    hotspot is still optimizing).

    We did not got any information about how these numbers were measured.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Michael McCandless
    Sent: Monday, June 06, 2011 5:58 PM
    To: java-user@lucene.apache.org
    Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?

    This test is very old (from the 1st edition of the book but removed from the
    2nd).

    Modern OS's cache newly written files in RAM, and this test doesn't write
    very large files (I think?), so the test is really testing an OS's IO cache vs
    Lucene's RAM Dir.

    That said, I'm not sure why RAMDir would be slower... FSDir still must go
    through the OS APIs even if the OS then caches in RAM.

    Mike McCandless

    http://blog.mikemccandless.com

    2011/6/6 zhoucheng2008 <zhoucheng2008@gmail.com>:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:



    //    /**

    //    // change to adjust performance of indexing with FSDirectory

    writer.mergeFactor = 100;

    writer.maxMergeDocs = 999999;

    writer.minMergeDocs = 1000;

    //    */



    Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Zhoucheng2008 at Jun 6, 2011 at 4:29 pm
    I did run it on a 64bit win7 and use Lucene 3.0.3. The result that FSD outperforms RAM in this case seems to be consistent as I ran a bunch of tests.

    My wild guess is that FSD can leverage the MMapDirectory advantage as well as the three tuning parameters. Just a thought.

    -----Original Message-----
    From: Uwe Schindler
    Sent: Tuesday, June 07, 2011 12:04 AM
    To: java-user@lucene.apache.org
    Subject: RE: RAMDirectory doesn't win over FSDirectory all the time, why?

    Hi,

    It depends on the Lucene version, so if the test uses latest Lucene on a
    64bit OS, it may use MMapDirectory internally (returned on
    FSDirectors.open()) - then its comparing the same with the same - reading
    from ram memory :-)

    Maybe the difference is also caused by not warming hotspot's compiler. Those
    times are only ok, if you repeat the same code path quite often and only use
    the recent results and not the ones from the first iterations 8AS Java's
    hotspot is still optimizing).

    We did not got any information about how these numbers were measured.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Michael McCandless
    Sent: Monday, June 06, 2011 5:58 PM
    To: java-user@lucene.apache.org
    Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?

    This test is very old (from the 1st edition of the book but removed from the
    2nd).

    Modern OS's cache newly written files in RAM, and this test doesn't write
    very large files (I think?), so the test is really testing an OS's IO cache vs
    Lucene's RAM Dir.

    That said, I'm not sure why RAMDir would be slower... FSDir still must go
    through the OS APIs even if the OS then caches in RAM.

    Mike McCandless

    http://blog.mikemccandless.com

    2011/6/6 zhoucheng2008 <zhoucheng2008@gmail.com>:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:



    // /**

    // // change to adjust performance of indexing with FSDirectory

    writer.mergeFactor = 100;

    writer.maxMergeDocs = 999999;

    writer.minMergeDocs = 1000;

    // */



    Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Toke Eskildsen at Jun 7, 2011 at 8:28 am

    On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:
    [...]Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    This is the code, right?
    http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java

    The test is problematic as the same two tests run sequentially.

    If you change
    long ramTiming = timeIndexWriter(ramDir);
    long fsTiming = timeIndexWriter(fsDir);
    to
    long fsTiming = timeIndexWriter(fsDir);
    long ramTiming = timeIndexWriter(ramDir);
    my guess is that RAMDirectory will be faster. For a better
    comparison, perform each test in separate runs (make a test
    class just for RAMDirectory and one just for FSDirectory,
    then run them one at a time, each in its own JVM).

    One big problem when comparing RAMDirectory to file-access
    is caching. What you measure with a test might not be what
    you see in production, as the production index might be
    large compared to RAM available for file caching.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Zhoucheng2008 at Jun 7, 2011 at 9:54 am
    Makes sense. Thanks

    -----Original Message-----
    From: Toke Eskildsen
    Sent: Tuesday, June 07, 2011 4:28 PM
    To: java-user@lucene.apache.org
    Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
    On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:
    [...]Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    This is the code, right?
    http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java

    The test is problematic as the same two tests run sequentially.

    If you change
    long ramTiming = timeIndexWriter(ramDir);
    long fsTiming = timeIndexWriter(fsDir);
    to
    long fsTiming = timeIndexWriter(fsDir);
    long ramTiming = timeIndexWriter(ramDir);
    my guess is that RAMDirectory will be faster. For a better
    comparison, perform each test in separate runs (make a test
    class just for RAMDirectory and one just for FSDirectory,
    then run them one at a time, each in its own JVM).

    One big problem when comparing RAMDirectory to file-access
    is caching. What you measure with a test might not be what
    you see in production, as the production index might be
    large compared to RAM available for file caching.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Lance Norskog at Jun 17, 2011 at 1:16 am
    The RAMDirectory uses Java memory, an FSDirectory does not. Holding
    Java memory makes garbage collection work harder. The operating system
    is very very good at managing disk buffers, and does a better job
    using spare memory than Java does.

    For real-world sites, RAMDirectory is almost always useless. Maybe the
    Instantiated index stuff is more what you want?

    Lance
    On Tue, Jun 7, 2011 at 2:52 AM, zhoucheng2008 wrote:
    Makes sense. Thanks

    -----Original Message-----
    From: Toke Eskildsen
    Sent: Tuesday, June 07, 2011 4:28 PM
    To: java-user@lucene.apache.org
    Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
    On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:
    [...]Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    This is the code, right?
    http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java

    The test is problematic as the same two tests run sequentially.

    If you change
    long ramTiming = timeIndexWriter(ramDir);
    long fsTiming = timeIndexWriter(fsDir);
    to
    long fsTiming = timeIndexWriter(fsDir);
    long ramTiming = timeIndexWriter(ramDir);
    my guess is that RAMDirectory will be faster. For a better
    comparison, perform each test in separate runs (make a test
    class just for RAMDirectory and one just for FSDirectory,
    then run them one at a time, each in its own JVM).

    One big problem when comparing RAMDirectory to file-access
    is caching. What you measure with a test might not be what
    you see in production, as the production index might be
    large compared to RAM available for file caching.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    Lance Norskog
    goksron@gmail.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sanne Grinovero at Jun 17, 2011 at 9:43 am
    Hello,
    I came to similar conclusions, and have a similar comparison test
    available here:
    https://github.com/infinispan/infinispan/blob/master/lucene-directory/src/test/java/org/infinispan/lucene/profiling/PerformanceCompareStressTest.java

    In my test I explicitly run the RAMDirectory first to warmup the JVM
    and the other Lucene components; also while I default to a short
    testing time to perform a fair comparison you should:
    a) make the test quite long - a couple of hours
    b) this version starts with an empty index and slowly grows up, it
    might make more sense to start with a fairly large index.

    I'm running the RAMDirectory first as to be fair in my case I wasn't
    very interested in it's performance: being limited to the available
    memory on your JVM is imho quite a dealbreaker for real applications,
    and also since the operating system can apply several smart caches
    when there's enough memory, my conclusion is that when you have
    memory, you should limit the JVM heap and leave that to the OS to make
    better use of FSDirectory, as this implementation is really well
    optimized, at least for local disks.

    When you don't have enough available memory, I would suggest - but
    warning: I'm biased - to try the Infinispan based Lucene Directory
    which is able to "join forces" the memory of multiple (remote) JVMs
    and passivate on external storage such as disk only when strictly
    needed (or for backups/shutdown): being still mostly an in memory
    solution it's able to outperform the FSDirectory during write
    operations, and is comparable in search performance, in some cases a
    little bit slower but it compensates by being able to scale
    horizontally with real time distribution. A current limitation is that
    you still need to use a single IndexWriter, even cluster-wide: the
    code is very simple and directly mimics the FSDirectory logic, so it
    supports all the same features and inherits the same limitations
    unlike other distributed solutions.

    Regards,
    Sanne

    2011/6/17 Lance Norskog <goksron@gmail.com>:
    The RAMDirectory uses Java memory, an FSDirectory does not. Holding
    Java memory makes garbage collection work harder. The operating system
    is very very good at managing disk buffers, and does a better job
    using spare memory than Java does.

    For real-world sites, RAMDirectory is almost always useless. Maybe the
    Instantiated index stuff is more what you want?

    Lance
    On Tue, Jun 7, 2011 at 2:52 AM, zhoucheng2008 wrote:
    Makes sense. Thanks

    -----Original Message-----
    From: Toke Eskildsen
    Sent: Tuesday, June 07, 2011 4:28 PM
    To: java-user@lucene.apache.org
    Subject: Re: RAMDirectory doesn't win over FSDirectory all the time, why?
    On Mon, 2011-06-06 at 15:29 +0200, zhoucheng2008 wrote:
    I read the lucene in action book and just tested the
    FSversusRAMDirectoryTest.java with the following uncommented:
    [...]Here is the output:

    RAMDirectory Time: 805 ms

    FSDirectory Time : 728 ms
    This is the code, right?
    http://java.codefetch.com/example/in/LuceneInAction/src/lia/indexing/FSversusRAMDirectoryTest.java

    The test is problematic as the same two tests run sequentially.

    If you change
    long ramTiming = timeIndexWriter(ramDir);
    long fsTiming = timeIndexWriter(fsDir);
    to
    long fsTiming = timeIndexWriter(fsDir);
    long ramTiming = timeIndexWriter(ramDir);
    my guess is that RAMDirectory will be faster. For a better
    comparison, perform each test in separate runs (make a test
    class just for RAMDirectory and one just for FSDirectory,
    then run them one at a time, each in its own JVM).

    One big problem when comparing RAMDirectory to file-access
    is caching. What you measure with a test might not be what
    you see in production, as the production index might be
    large compared to RAM available for file caching.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    Lance Norskog
    goksron@gmail.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 6, '11 at 1:30p
activeJun 17, '11 at 9:43a
posts8
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase