FAQ
Hello all,
I know you lucene devs did a lot of work on indexing performance in 2.3,
and I just tested it out last thursday, so I thought I'd let you know how it
fared:

On a 2.17 million document index, a recent test gave indexing time to be:

* lucene 2.2: 4.83 hours
* lucene 2.3: 26 minutes

About a factor of 11 speedup. Holy smokes! Great work folks.


-jake

Search Discussions

  • Michael McCandless at Feb 3, 2008 at 10:11 pm
    Awesome! We are glad to hear that :)

    You might be able to make it even faster with the steps here:

    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

    Mike

    Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing performance
    in 2.3,
    and I just tested it out last thursday, so I thought I'd let you
    know how it
    fared:

    On a 2.17 million document index, a recent test gave indexing
    time to be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Feb 4, 2008 at 12:47 am
    Yeah, I should have mentioned - this was merely with a jar replacement, we
    haven't gotten around to doing fun 2.3-related stuff like making sure our
    domain-specific tokenizers use the next(Token), as well as making sure set
    all of our buffersizes by RAM used.

    We tried multithreading the process, as we have a multi-core, multi-disk
    architecture, but for some reason we never saw more than 99% (of one core)
    cpu usage during indexing, as if some internal synchronization was getting
    hit... I should try it again through the profiler and see if I can pinpoint
    where it was getting tripped up. On the other hand, I'm not sure if we
    *need* faster than 26 minute indexing, so once we're sure we can move up to
    2.3 for production, that may just solve our indexing perf issues.

    Now if I can just figure out how to speed up our query performance too, I'll
    be in an even *better* mood. :)

    -jake
    On Feb 3, 2008 2:11 PM, Michael McCandless wrote:


    Awesome! We are glad to hear that :)

    You might be able to make it even faster with the steps here:

    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

    Mike

    Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing performance
    in 2.3,
    and I just tested it out last thursday, so I thought I'd let you
    know how it
    fared:

    On a 2.17 million document index, a recent test gave indexing
    time to be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Briggs at Feb 3, 2008 at 10:11 pm
    Damn, really? I haven't had the opportunity to test this yet. Has
    anyone else seen this kind of improvement?


    On Feb 3, 2008 2:57 PM, Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing performance in 2.3,
    and I just tested it out last thursday, so I thought I'd let you know how it
    fared:

    On a 2.17 million document index, a recent test gave indexing time to be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake


    --
    "Conscious decisions by conscious minds are what make reality real"

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Feb 4, 2008 at 5:26 am
    Note that in particular, we use the StandardTokenizer as part of our
    analyzer
    chain, which means it has the switch from the JavaCC version to the JFlex
    based
    code, which I'm betting is a substantial part of that speedup.

    -jake
    On Feb 3, 2008 2:11 PM, Briggs wrote:

    Damn, really? I haven't had the opportunity to test this yet. Has
    anyone else seen this kind of improvement?


    On Feb 3, 2008 2:57 PM, Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing performance in 2.3,
    and I just tested it out last thursday, so I thought I'd let you know how it
    fared:

    On a 2.17 million document index, a recent test gave indexing time to be:
    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake


    --
    "Conscious decisions by conscious minds are what make reality real"

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ajay_garg at Feb 4, 2008 at 5:02 am
    Hi Jake.

    Was the test conducted with a single indexing thread, or multiple ones ?


    Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing performance in 2.3,
    and I just tested it out last thursday, so I thought I'd let you know how
    it
    fared:

    On a 2.17 million document index, a recent test gave indexing time to
    be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake
    --
    View this message in context: http://www.nabble.com/Indexing-Speed%3A-2.3-vs-2.2-%28real-world-numbers%29-tp15257512p15262216.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Feb 4, 2008 at 5:24 am
    The test in which we got the 11X speedup? That was single threaded. I
    haven't yet found a way to make multithreaded (shared IndexWriter) indexing
    perform with any better speed than singlethreaded, so that code is not
    enabled in our tests. Do you think that 2.3 would better take advantage of
    multiple threads / cores? If so, I could rerun it again multithreaded and
    see if that's even better...

    -jake
    On Feb 3, 2008 9:02 PM, ajay_garg wrote:


    Hi Jake.

    Was the test conducted with a single indexing thread, or multiple ones ?


    Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing performance in 2.3,
    and I just tested it out last thursday, so I thought I'd let you know how
    it
    fared:

    On a 2.17 million document index, a recent test gave indexing time to
    be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake
    --
    View this message in context:
    http://www.nabble.com/Indexing-Speed%3A-2.3-vs-2.2-%28real-world-numbers%29-tp15257512p15262216.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Feb 4, 2008 at 10:51 am
    Even pre-2.3, you should have seen gains by adding threads, if indeed
    your hardware has good concurrency.

    And definitely with the changes in 2.3, you should see gains by
    adding threads.

    Note that as you add threads, the "sweet spot" for RAM buffer size
    increases. Ie, make the RAM buffer bigger as you add more threads.

    I think the only major thing that's single-threaded is flushing a new
    segment to disk. Only one thread can do that, and while that thread
    is doing so, other threads must wait.

    Mike

    Jake Mannix wrote:
    ------=_Part_3862_23986701.1202102642086
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: 7bit
    Content-Disposition: inline

    The test in which we got the 11X speedup? That was single
    threaded. I
    haven't yet found a way to make multithreaded (shared IndexWriter)
    indexing
    perform with any better speed than singlethreaded, so that code is not
    enabled in our tests. Do you think that 2.3 would better take
    advantage of
    multiple threads / cores? If so, I could rerun it again
    multithreaded and
    see if that's even better...

    -jake

    On Feb 3, 2008 9:02 PM, ajay_garg
    wrote:
    Hi Jake.

    Was the test conducted with a single indexing thread, or multiple
    ones ?


    Jake Mannix wrote:
    Hello all,
    I know you lucene devs did a lot of work on indexing
    performance in 2.3,
    and I just tested it out last thursday, so I thought I'd let you
    know how
    it
    fared:

    On a 2.17 million document index, a recent test gave indexing
    time to
    be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

    About a factor of 11 speedup. Holy smokes! Great work folks.


    -jake
    --
    View this message in context:
    http://www.nabble.com/Indexing-Speed%3A-2.3-vs-2.2-%28real-world-
    numbers%29-tp15257512p15262216.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daniel Noll at Feb 5, 2008 at 4:59 am

    On Monday 04 February 2008 21:51:39 Michael McCandless wrote:
    Even pre-2.3, you should have seen gains by adding threads, if indeed
    your hardware has good concurrency.

    And definitely with the changes in 2.3, you should see gains by
    adding threads.
    With regards to this, I have been wondering: are there still huge performance
    benefits with using N threads on N IndexWriters, vs. using N threads on a
    single IndexWriter? Or have the optimisations in version 2.3 closed the gap
    for this?

    Daniel

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mitchell, Erica at Feb 4, 2008 at 5:18 pm
    Hi,

    I'm trying to test out Luke and I get an error saying unknown format
    error:-4
    The index I'm trying to point to is the one built by the demo in the
    documentation for getting started with lucene.

    Can anyone please tell me what this error might mean.

    Thanks
    Erica

    ----------------------------
    IONA Technologies PLC (registered in Ireland)
    Registered Number: 171387
    Registered Address: The IONA Building, Shelbourne Road, Dublin 4, Ireland

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 3, '08 at 7:58p
activeFeb 5, '08 at 4:59a
posts10
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase