FAQ
Hi,
I got outof memory exception while indexing huge documents (~1GB) in
one thread and optimizing some other (2 to 3) indexes in different threads.
Max JVM heap size is 512MB. I'm using lucene2.3.0.

Please suggest a way to avoid this exception.

Regards
RSK

Search Discussions

  • Erick Erickson at Feb 4, 2008 at 3:53 pm
    ummmm index smaller documents? <G>

    You cannot expect to index a 1G doc with 512M of memory in the JVM.
    The first thing I'd try is upping your JVM memory to the max your machine
    will accept.

    Make sure you flush your IndexWriter before attempting to index this
    document.

    But I would not be surprised if this failed to solve the problem. What's in
    this massive document? Would it be possible to break it up into
    smaller segments and index many sub-documents for this massive doc?
    I also wonder what problem you're trying to solve by indexing this doc.
    Is it a log file? I can't imagine a text document that big. That's like a
    100 volume encyclopedia, and I can't help but wonder whether your users
    would be better served by indexing it in pieces.

    Best
    Erick
    On Feb 4, 2008 10:25 AM, SK R wrote:

    Hi,
    I got outof memory exception while indexing huge documents (~1GB) in
    one thread and optimizing some other (2 to 3) indexes in different
    threads.
    Max JVM heap size is 512MB. I'm using lucene2.3.0.

    Please suggest a way to avoid this exception.

    Regards
    RSK
  • SK R at Feb 5, 2008 at 2:52 pm
    Hi,
    Thanks for your help Erick.

    I changed my code to flush writer before document add which helps to
    reduce memory usage.
    Also reducing mergefactor and max buffered docs to some level help me to
    avoid this OOM error (eventhough index size is ~1GB).

    But please clarify below doubts

    Make sure you flush your IndexWriter before attempting to index this
    document.

    - Is it good to call writer.flush() before adding every document into
    writer? Doesn't it affect performance of indexing or search? Whether it's
    also similar to setting MaxBufferDocs=1?

    Also guide me which one is relatively good (take less time & memory)
    among this
    (i) create 4 indexes each of 250MB and merge them to single index
    file by using writer.addIndexes(..)
    (ii) create a 1GB index & optimize it?

    Thanks & Regards
    RSK


    On Feb 4, 2008 9:23 PM, Erick Erickson wrote:

    ummmm index smaller documents? <G>

    You cannot expect to index a 1G doc with 512M of memory in the JVM.
    The first thing I'd try is upping your JVM memory to the max your machine
    will accept.

    Make sure you flush your IndexWriter before attempting to index this
    document.

    But I would not be surprised if this failed to solve the problem. What's
    in
    this massive document? Would it be possible to break it up into
    smaller segments and index many sub-documents for this massive doc?
    I also wonder what problem you're trying to solve by indexing this doc.
    Is it a log file? I can't imagine a text document that big. That's like a
    100 volume encyclopedia, and I can't help but wonder whether your users
    would be better served by indexing it in pieces.

    Best
    Erick
    On Feb 4, 2008 10:25 AM, SK R wrote:

    Hi,
    I got outof memory exception while indexing huge documents (~1GB) in
    one thread and optimizing some other (2 to 3) indexes in different
    threads.
    Max JVM heap size is 512MB. I'm using lucene2.3.0.

    Please suggest a way to avoid this exception.

    Regards
    RSK
  • Erick Erickson at Feb 5, 2008 at 3:55 pm
    See below:
    On Feb 5, 2008 9:41 AM, SK R wrote:

    Hi,
    Thanks for your help Erick.

    I changed my code to flush writer before document add which helps to
    reduce memory usage.
    Also reducing mergefactor and max buffered docs to some level help me to
    avoid this OOM error (eventhough index size is ~1GB).

    But please clarify below doubts

    Make sure you flush your IndexWriter before attempting to index this
    document.

    - Is it good to call writer.flush() before adding every document into
    writer? Doesn't it affect performance of indexing or search? Whether it's
    also similar to setting MaxBufferDocs=1?
    No, this is not a good idea. I'd expect this to slow down indexing
    significantly.
    What I was assuming is that you'd have something like:

    if (incoming document is huge) flush index writer

    just to free up all the memory you can.

    Also guide me which one is relatively good (take less time & memory)
    among this
    (i) create 4 indexes each of 250MB and merge them to single index
    file by using writer.addIndexes(..)
    (ii) create a 1GB index & optimize it?
    Don't know. You have to measure your particular situation. There's some
    discussion
    (search the archives) about using several threads to speed up indexing.
    Also, there's
    the wiki page, see

    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

    The first bullet point is important here. Do you really need to improve
    indexing speed?
    How long does it take and how often to you build it?

    But perhaps I mis-read your original post. I *thought* you were talking
    about
    indexing a 1G *document*. The size of the index shouldn't matter as far as
    an OOM error. But now that I re-read your original post, I should have also
    suggested that you optimize in different processes than you index since the
    implication is that they are separate indexes anyway.

    Best
    Erick

    Thanks & Regards
    RSK


    On Feb 4, 2008 9:23 PM, Erick Erickson wrote:

    ummmm index smaller documents? <G>

    You cannot expect to index a 1G doc with 512M of memory in the JVM.
    The first thing I'd try is upping your JVM memory to the max your machine
    will accept.

    Make sure you flush your IndexWriter before attempting to index this
    document.

    But I would not be surprised if this failed to solve the problem. What's
    in
    this massive document? Would it be possible to break it up into
    smaller segments and index many sub-documents for this massive doc?
    I also wonder what problem you're trying to solve by indexing this doc.
    Is it a log file? I can't imagine a text document that big. That's like a
    100 volume encyclopedia, and I can't help but wonder whether your users
    would be better served by indexing it in pieces.

    Best
    Erick
    On Feb 4, 2008 10:25 AM, SK R wrote:

    Hi,
    I got outof memory exception while indexing huge documents (~1GB)
    in
    one thread and optimizing some other (2 to 3) indexes in different
    threads.
    Max JVM heap size is 512MB. I'm using lucene2.3.0.

    Please suggest a way to avoid this exception.

    Regards
    RSK

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 4, '08 at 3:26p
activeFeb 5, '08 at 3:55p
posts4
users2
websitelucene.apache.org

2 users in discussion

Erick Erickson: 2 posts SK R: 2 posts

People

Translate

site design / logo © 2022 Grokbase