FAQ
Hi all, I was wondering whether you could give me some advice on how to
improve my search performance.

I have 90 lucene indexes, each having different fields (~5 per
Document). When I search, I always have to go through all indexes to
build my result set. Searching one index takes approx. 100ms, thus
searching all indexes takes 9s in total.

How can I reduce the time it needs to search?

I decided to create this many indexes because putting all data in one
index would mean that a document would have ~400 fields, with most of
them left empty. Is that ok? Would a single index be faster compared to
multiple small ones?

Any pointers are much appreciated.

Regards,
Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Jun 1, 2011 at 9:26 pm
    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per Document).
    When I search, I always have to go through all indexes to build my result
    set. Searching one index takes approx. 100ms, thus searching all indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one index
    would mean that a document would have ~400 fields, with most of them left
    empty. Is that ok? Would a single index be faster compared to multiple small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Alexander Rosemann at Jun 1, 2011 at 9:37 pm
    Many thanks for the tips, Erick! I do close each searcher after a
    search... I will change that first thing tmrw. and let you know how that
    went. Multi-threaded searching will be next and if that hasn't helped, I
    will switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:
    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per Document).
    When I search, I always have to go through all indexes to build my result
    set. Searching one index takes approx. 100ms, thus searching all indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one index
    would mean that a document would have ~400 fields, with most of them left
    empty. Is that ok? Would a single index be faster compared to multiple small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Shai Erera at Jun 2, 2011 at 3:28 am
    All indexes together are rather small, ~200MB and 50.000 documents.

    Then I would definitely consider merging them under one index. Even if you
    don't close the searcher, it will still require 90 x N ms to search them,
    N=ms to search one index.

    Also, multi-threading will improve, but only up to a point - because you
    cannot parallelize 90 searches (unless you have some sort of super-computer
    there).

    On the other hand, if you merge them into one index then you'll be talking
    about an index that's <20GB and <5M docs, which is definitely reasonable for
    Lucene and performance (depends of course on the search application, but
    generally) is very good.

    Starting Lucene 3.1 you can perform your searches in parallel (over one
    index) using IndexSearcher, which comes in handy if your index has multiple
    segments. Look at
    http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.index.IndexReader,
    java.util.concurrent.ExecutorService).

    Having said that, keeping the indexes separate may have advantages that your
    application needs. For example, if those indexes are completely rebuilt very
    frequently, then it's much better to delete and index and rebuild, then to
    delete 50K docs from the merged large index. But that really depends on your
    application needs.

    I'd say, if you don't see a strong case for keeping them apart, merge them
    into one. Besides performance, there's also index management overhead, maybe
    synchronizing commits, making sure all are closed/opened together etc., that
    may just be an unnecessary overhead.

    BTW, in Lucene in Action 2nd Edition, there's an example class called
    SearcherManager which manages IndexSearcher instances and ensures an
    IndexSearcher instance is closed only after the last thread released it + it
    can manage the reopen() logic for you as well as warming up the index. You
    might want to give it a try too !
    LUCENE-2955<https://issues.apache.org/jira/browse/LUCENE-2955> makes
    use of it, so you can consult it for examples (it's still not committed).

    Hope this helps,
    Shai
    On Thu, Jun 2, 2011 at 12:37 AM, Alexander Rosemann wrote:

    Many thanks for the tips, Erick! I do close each searcher after a search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex

    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per Document).
    When I search, I always have to go through all indexes to build my result
    set. Searching one index takes approx. 100ms, thus searching all indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one
    index
    would mean that a document would have ~400 fields, with most of them left
    empty. Is that ok? Would a single index be faster compared to multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Alexander Rosemann at Jun 2, 2011 at 7:57 am
    Many, many thanks for the input. I have applied the little change of not
    closing the searchers each time and search times dropped already by half!

    I'll try to merge all indexes into a single one next. I'll let you know
    how that went.

    On 02.06.2011 05:28, Shai Erera wrote:

    All indexes together are rather small, ~200MB and 50.000 documents.

    Then I would definitely consider merging them under one index. Even if you
    don't close the searcher, it will still require 90 x N ms to search them,
    N=ms to search one index.

    Also, multi-threading will improve, but only up to a point - because you
    cannot parallelize 90 searches (unless you have some sort of super-computer
    there).

    On the other hand, if you merge them into one index then you'll be talking
    about an index that's<20GB and<5M docs, which is definitely reasonable for
    Lucene and performance (depends of course on the search application, but
    generally) is very good.

    Starting Lucene 3.1 you can perform your searches in parallel (over one
    index) using IndexSearcher, which comes in handy if your index has multiple
    segments. Look at
    http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.index.IndexReader,
    java.util.concurrent.ExecutorService).

    Having said that, keeping the indexes separate may have advantages that your
    application needs. For example, if those indexes are completely rebuilt very
    frequently, then it's much better to delete and index and rebuild, then to
    delete 50K docs from the merged large index. But that really depends on your
    application needs.

    I'd say, if you don't see a strong case for keeping them apart, merge them
    into one. Besides performance, there's also index management overhead, maybe
    synchronizing commits, making sure all are closed/opened together etc., that
    may just be an unnecessary overhead.

    BTW, in Lucene in Action 2nd Edition, there's an example class called
    SearcherManager which manages IndexSearcher instances and ensures an
    IndexSearcher instance is closed only after the last thread released it + it
    can manage the reopen() logic for you as well as warming up the index. You
    might want to give it a try too !
    LUCENE-2955<https://issues.apache.org/jira/browse/LUCENE-2955> makes
    use of it, so you can consult it for examples (it's still not committed).

    Hope this helps,
    Shai

    On Thu, Jun 2, 2011 at 12:37 AM, Alexander Rosemann<
    alexander.rosemann@gmail.com> wrote:
    Many thanks for the tips, Erick! I do close each searcher after a search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex

    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per Document).
    When I search, I always have to go through all indexes to build my result
    set. Searching one index takes approx. 100ms, thus searching all indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one
    index
    would mean that a document would have ~400 fields, with most of them left
    empty. Is that ok? Would a single index be faster compared to multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jun 2, 2011 at 11:04 am
    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per Document).
    When I search, I always have to go through all indexes to build my result
    set. Searching one index takes approx. 100ms, thus searching all indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one
    index
    would mean that a document would have ~400 fields, with most of them left
    empty. Is that ok? Would a single index be faster compared to multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Alexander Rosemann at Jun 2, 2011 at 1:28 pm
    Hi Erick, caching the IndexSearchers didn't took too much effort and
    decreased searching already by 30%!

    I am busy changing the code to use a single index as you suggested atm.
    Still a few things left to be done but once I have it working I let you
    know how much faster it is for me.

    Thanks,
    Alex
    On 02.06.2011 13:04, Erick Erickson wrote:
    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per Document).
    When I search, I always have to go through all indexes to build my result
    set. Searching one index takes approx. 100ms, thus searching all indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one
    index
    would mean that a document would have ~400 fields, with most of them left
    empty. Is that ok? Would a single index be faster compared to multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jun 2, 2011 at 1:36 pm
    Sounds good, just be sure to keep your (now single) searcher open! Also,
    be sure to measure queries after a while. The first few queries will fill up
    caches etc, so the time should improve after the first few.

    Best
    Erick

    On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
    wrote:
    Hi Erick, caching the IndexSearchers didn't took too much effort and
    decreased searching already by 30%!

    I am busy changing the code to use a single index as you suggested atm.
    Still a few things left to be done but once I have it working I let you know
    how much faster it is for me.

    Thanks,
    Alex
    On 02.06.2011 13:04, Erick Erickson wrote:

    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a
    search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per
    Document).
    When I search, I always have to go through all indexes to build my
    result
    set. Searching one index takes approx. 100ms, thus searching all
    indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one
    index
    would mean that a document would have ~400 fields, with most of them
    left
    empty. Is that ok? Would a single index be faster compared to multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Alexander Rosemann at Jun 2, 2011 at 1:49 pm
    No worries, I'll keep that in mind now.
    In addition I am going to switch to another collector as well. ATM I
    collect the results and then sort them using the std. Collections.sort
    approach... I have to look what Lucene offers and switch to something else.

    Thanks,
    Alex
    On 02.06.2011 15:36, Erick Erickson wrote:
    Sounds good, just be sure to keep your (now single) searcher open! Also,
    be sure to measure queries after a while. The first few queries will fill up
    caches etc, so the time should improve after the first few.

    Best
    Erick

    On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
    wrote:
    Hi Erick, caching the IndexSearchers didn't took too much effort and
    decreased searching already by 30%!

    I am busy changing the code to use a single index as you suggested atm.
    Still a few things left to be done but once I have it working I let you know
    how much faster it is for me.

    Thanks,
    Alex
    On 02.06.2011 13:04, Erick Erickson wrote:

    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a
    search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per
    Document).
    When I search, I always have to go through all indexes to build my
    result
    set. Searching one index takes approx. 100ms, thus searching all
    indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in one
    index
    would mean that a document would have ~400 fields, with most of them
    left
    empty. Is that ok? Would a single index be faster compared to multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Alexander Rosemann at Jun 3, 2011 at 6:39 am
    Alright. With all the changes you suggested I am down from 9s to <1s.
    Again, many thanks to both of you Erick and Shai!

    Regards,
    Alex
    On 02.06.2011 15:48, Alexander Rosemann wrote:
    No worries, I'll keep that in mind now.
    In addition I am going to switch to another collector as well. ATM I
    collect the results and then sort them using the std. Collections.sort
    approach... I have to look what Lucene offers and switch to something else.

    Thanks,
    Alex
    On 02.06.2011 15:36, Erick Erickson wrote:
    Sounds good, just be sure to keep your (now single) searcher open! Also,
    be sure to measure queries after a while. The first few queries will
    fill up
    caches etc, so the time should improve after the first few.

    Best
    Erick

    On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
    wrote:
    Hi Erick, caching the IndexSearchers didn't took too much effort and
    decreased searching already by 30%!

    I am busy changing the code to use a single index as you suggested atm.
    Still a few things left to be done but once I have it working I let
    you know
    how much faster it is for me.

    Thanks,
    Alex
    On 02.06.2011 13:04, Erick Erickson wrote:

    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a
    search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I
    will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on
    how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per
    Document).
    When I search, I always have to go through all indexes to build my
    result
    set. Searching one index takes approx. 100ms, thus searching all
    indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in
    one
    index
    would mean that a document would have ~400 fields, with most of them
    left
    empty. Is that ok? Would a single index be faster compared to
    multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------

    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jun 3, 2011 at 9:00 pm
    OK, if they're all in a single index, you might also try using Lucene
    sorting. Be aware
    that the first sort on a field takes extra time to warm the caches...

    But note that sorting is for single-valued, un-tokenized fields..

    Best
    Erick

    On Fri, Jun 3, 2011 at 2:39 AM, Alexander Rosemann
    wrote:
    Alright. With all the changes you suggested I am down from 9s to <1s. Again,
    many thanks to both of you Erick and Shai!

    Regards,
    Alex
    On 02.06.2011 15:48, Alexander Rosemann wrote:

    No worries, I'll keep that in mind now.
    In addition I am going to switch to another collector as well. ATM I
    collect the results and then sort them using the std. Collections.sort
    approach... I have to look what Lucene offers and switch to something
    else.

    Thanks,
    Alex
    On 02.06.2011 15:36, Erick Erickson wrote:

    Sounds good, just be sure to keep your (now single) searcher open! Also,
    be sure to measure queries after a while. The first few queries will
    fill up
    caches etc, so the time should improve after the first few.

    Best
    Erick

    On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
    wrote:
    Hi Erick, caching the IndexSearchers didn't took too much effort and
    decreased searching already by 30%!

    I am busy changing the code to use a single index as you suggested atm.
    Still a few things left to be done but once I have it working I let
    you know
    how much faster it is for me.

    Thanks,
    Alex
    On 02.06.2011 13:04, Erick Erickson wrote:

    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a
    search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I
    will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on
    how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per
    Document).
    When I search, I always have to go through all indexes to build my
    result
    set. Searching one index takes approx. 100ms, thus searching all
    indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data in
    one
    index
    would mean that a document would have ~400 fields, with most of them
    left
    empty. Is that ok? Would a single index be faster compared to
    multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex


    ---------------------------------------------------------------------

    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Itamar Syn-Hershko at Jun 10, 2011 at 2:48 pm
    Erick,


    Sorry about reopening this more than a week late...


    You were asking about the size of each index; at what index size would
    you consider splitting to several indices with multiple searches etc,
    for what reasons, and does it matter which Lucene version is used?


    Thanks :)


    Itamar.

    On 02/06/2011 16:48, Alexander Rosemann wrote:

    No worries, I'll keep that in mind now.
    In addition I am going to switch to another collector as well. ATM I
    collect the results and then sort them using the std. Collections.sort
    approach... I have to look what Lucene offers and switch to something
    else.

    Thanks,
    Alex
    On 02.06.2011 15:36, Erick Erickson wrote:
    Sounds good, just be sure to keep your (now single) searcher open! Also,
    be sure to measure queries after a while. The first few queries will
    fill up
    caches etc, so the time should improve after the first few.

    Best
    Erick

    On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
    wrote:
    Hi Erick, caching the IndexSearchers didn't took too much effort and
    decreased searching already by 30%!

    I am busy changing the code to use a single index as you suggested atm.
    Still a few things left to be done but once I have it working I let
    you know
    how much faster it is for me.

    Thanks,
    Alex
    On 02.06.2011 13:04, Erick Erickson wrote:

    At this size, really consider going to a single index. The lack of
    administrative headaches alone is probably well worth the effort....

    I almost guarantee that the time you spend re-writing things to keep
    the searchers open (and finding the bugs!) will be far more than just
    putting all the data in a single index.

    But that might just be my preferences showing....

    Best
    Erick

    On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
    wrote:
    Many thanks for the tips, Erick! I do close each searcher after a
    search...
    I will change that first thing tmrw. and let you know how that went.
    Multi-threaded searching will be next and if that hasn't helped, I
    will
    switch to one big index.
    All indexes together are rather small, ~200MB and 50.000 documents.

    -Alex
    On 01.06.2011 23:26, Erick Erickson wrote:

    I'd start by putting them all in one index. There's no penalty
    in Lucene for having empty fields in a document, unlike an
    RDBMS.

    Alternately, if you're opening then closing searchers each
    time, that's very expensive. Could you open the searchers
    once and keep them open (all 90 of them)? That alone might
    do the trick and be less of a change to your program. You
    could also fire multiple threads at the searches, but check if
    you're CPU bound first (if you are, multiple threads won't
    help much/at all).

    You haven't said how big these indexes are nor how many
    documents you're talking about here, so this advice is suspect.

    Do look at putting it all in one index though, let us know if you
    have some data indicating how big stuff is/would be.

    Best
    Erick

    On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
    wrote:
    Hi all, I was wondering whether you could give me some advice on
    how to
    improve my search performance.

    I have 90 lucene indexes, each having different fields (~5 per
    Document).
    When I search, I always have to go through all indexes to build my
    result
    set. Searching one index takes approx. 100ms, thus searching all
    indexes
    takes 9s in total.

    How can I reduce the time it needs to search?

    I decided to create this many indexes because putting all data
    in one
    index
    would mean that a document would have ~400 fields, with most of
    them
    left
    empty. Is that ok? Would a single index be faster compared to
    multiple
    small
    ones?

    Any pointers are much appreciated.

    Regards,
    Alex

    ---------------------------------------------------------------------

    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------

    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 1, '11 at 8:35p
activeJun 10, '11 at 2:48p
posts12
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase