I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
small tests went pretty well, about 40,000 docs indexed in 10 minutes.
However, each time I try to ramp up to larger indexes, the indexing starts
to slow significantly once I hit around 100,000 docs (to only about 6,000
docs in 10 minutes).

I set-up a log, and now see that at around the 100,000 record point, it
writes to the log for 10 minutes, and then just seems to "do nothing" for
about 10 minutes. Not sure if this is some kind of gearing up to flush or?

My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
high? My understanding was the higher the better, as long as you stayed
within memory limits of the machine, etc.

Would really appreciate any advice on how to get better consistently fast
indexing!

Search Discussions

  • Kevin Miller at Oct 9, 2010 at 1:29 pm
    Scott I had a similar situation happen when I had messed with the merge factor. I restored the merge to the default and saw an increase in performance.

    Another reason I had a similar slowdown was I had an indexer which was doing searches against the index itself. As the index gets bigger this search gets slower.

    Hope this helps.

    Sent from my wee mobile keypad
    On Oct 9, 2010, at 12:18 AM, Scott Bundren wrote:

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush or?

    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!
  • Igor Chirokov at Oct 9, 2010 at 9:56 pm
    Hi Scott,



    I had the same problem. I created tools where you can created any type of index from SQL Server or Oracle databases. Looks like integrated to database Lucene index . I tested for 10,000,000 records/docs as well. First time I did not created at all . Sometimes created 3 millions, sometimes 6 millions for more then 10 hours. It seems to me problem in memory in your computer and in GC, you could not menage when your object will be destroyed and so on. I did not completely resolved this problem, but on my tested computer( 500 MB of RAM and 2.99 Ghz) I can created index for 10,000,000 records for 1.5 - 2 hours, more, you can see log file where you can see how mach time you spent for some set of records.

    See www.walnutilsoft.com, where you can download that tools and try to create index from SQL Server or Oracle db.

    Also you can use some sample for searching where you can test created index as well.



    Hope this help.



    Igor Chirokov

    Subject: Re: indexing slows after first 100,000 records
    From: scoundrel@gmail.com
    Date: Sat, 9 Oct 2010 08:29:33 -0500
    To: lucene-net-user@lucene.apache.org

    Scott I had a similar situation happen when I had messed with the merge factor. I restored the merge to the default and saw an increase in performance.

    Another reason I had a similar slowdown was I had an indexer which was doing searches against the index itself. As the index gets bigger this search gets slower.

    Hope this helps.

    Sent from my wee mobile keypad
    On Oct 9, 2010, at 12:18 AM, Scott Bundren wrote:

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush or?

    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!
  • Digy at Oct 9, 2010 at 10:33 pm
    Using "Commit" (say every 10000 docs) may solve the problem.

    DIGY.

    -----Original Message-----
    From: Scott Bundren
    Sent: Saturday, October 09, 2010 8:18 AM
    To: lucene-net-user@lucene.apache.org
    Subject: indexing slows after first 100,000 records

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush or?

    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!
  • Scott Bundren at Oct 9, 2010 at 11:24 pm
    I'm unable to see the IndexWriter.Commit() method in Visual Studio...I
    figured it was not ported over to the C# version of Lucene. Should I be
    seeing this here, or looking somewhere else for it?

    On Sat, Oct 9, 2010 at 3:32 PM, Digy wrote:

    Using "Commit" (say every 10000 docs) may solve the problem.

    DIGY.

    -----Original Message-----
    From: Scott Bundren
    Sent: Saturday, October 09, 2010 8:18 AM
    To: lucene-net-user@lucene.apache.org
    Subject: indexing slows after first 100,000 records

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush or?

    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!
  • Scott Bundren at Oct 9, 2010 at 11:26 pm
    On a related note, I was also unable see the Field.SetValue method...again,
    I figured it was not ported over?
    On Sat, Oct 9, 2010 at 4:23 PM, Scott Bundren wrote:

    I'm unable to see the IndexWriter.Commit() method in Visual Studio...I
    figured it was not ported over to the C# version of Lucene. Should I be
    seeing this here, or looking somewhere else for it?


    On Sat, Oct 9, 2010 at 3:32 PM, Digy wrote:

    Using "Commit" (say every 10000 docs) may solve the problem.

    DIGY.

    -----Original Message-----
    From: Scott Bundren
    Sent: Saturday, October 09, 2010 8:18 AM
    To: lucene-net-user@lucene.apache.org
    Subject: indexing slows after first 100,000 records

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush
    or?

    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!
  • Digy at Oct 10, 2010 at 12:10 am
    Be sure that you are using v2.9.2

    DIGY

    -----Original Message-----
    From: Scott Bundren
    Sent: Sunday, October 10, 2010 2:24 AM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: indexing slows after first 100,000 records

    I'm unable to see the IndexWriter.Commit() method in Visual Studio...I
    figured it was not ported over to the C# version of Lucene. Should I be
    seeing this here, or looking somewhere else for it?

    On Sat, Oct 9, 2010 at 3:32 PM, Digy wrote:

    Using "Commit" (say every 10000 docs) may solve the problem.

    DIGY.

    -----Original Message-----
    From: Scott Bundren
    Sent: Saturday, October 09, 2010 8:18 AM
    To: lucene-net-user@lucene.apache.org
    Subject: indexing slows after first 100,000 records

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush or?
    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!
  • Scott Bundren at Oct 10, 2010 at 4:15 pm
    DIGY thanks!! I had a 3 year old version (followed the "official binary
    releases" link from the home pg - bit confusing). Switching to the 2.9.2
    makes it exceedingly fast. I tested it with several configurations, and the
    defaults seemed fastest, which is cool.

    One small issue that has come up: after downloading the 2.9.2 source and
    compiling, now when I run my c# app that uses the lucene DLL, it keeps
    walking me through the actual Lucene code. While I'd like to take a look
    sometime, for now I'd like to turn off this behavior, so it just steps
    through my apps code. Anyone know where to configure this in Visual Studio?

    Scott
    On Sat, Oct 9, 2010 at 5:09 PM, Digy wrote:

    Be sure that you are using v2.9.2

    DIGY

    -----Original Message-----
    From: Scott Bundren
    Sent: Sunday, October 10, 2010 2:24 AM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: indexing slows after first 100,000 records

    I'm unable to see the IndexWriter.Commit() method in Visual Studio...I
    figured it was not ported over to the C# version of Lucene. Should I be
    seeing this here, or looking somewhere else for it?

    On Sat, Oct 9, 2010 at 3:32 PM, Digy wrote:

    Using "Commit" (say every 10000 docs) may solve the problem.

    DIGY.

    -----Original Message-----
    From: Scott Bundren
    Sent: Saturday, October 09, 2010 8:18 AM
    To: lucene-net-user@lucene.apache.org
    Subject: indexing slows after first 100,000 records

    I'm trying to build an index of about 10GB / 10,000,000 docs. My initial
    small tests went pretty well, about 40,000 docs indexed in 10 minutes.
    However, each time I try to ramp up to larger indexes, the indexing starts
    to slow significantly once I hit around 100,000 docs (to only about 6,000
    docs in 10 minutes).

    I set-up a log, and now see that at around the 100,000 record point, it
    writes to the log for 10 minutes, and then just seems to "do nothing" for
    about 10 minutes. Not sure if this is some kind of gearing up to flush or?
    My SetMergeFactor=1000 and SetMaxBufferedDocs=10000. Is buffered docs too
    high? My understanding was the higher the better, as long as you stayed
    within memory limits of the machine, etc.

    Would really appreciate any advice on how to get better consistently fast
    indexing!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedOct 9, '10 at 5:18a
activeOct 10, '10 at 4:15p
posts8
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase