FAQ
Hi
Guys

Apologies..........



History

Ist type : 40000 subindexes + MultiSearcher + Search on Content Field
Only for 2000 hits

=
Exception [ Too many Files Open ]





IInd type : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher
/ParallelSearcher + Search on Content Field Only for 20000 hits

=
Exception [ OutOf Memeory ]



System Config [same for both type]

Amd Processor [High End Single]
RAM 1GB
O/s Linux ( jantoo type )
Appserver Tomcat 5.05
Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ]

Index contains 15 Fields
Search
Done only on 1 field
Retrieve 11 corrosponding fields
3 Fields are for debug details


Switched from Ist type to IInd Type

Can some body suggest me Why is this Happening

Thx in advance




WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Yahootintin-lucene at Nov 10, 2004 at 7:16 am
    There is a memory leak in the sorting code of Lucene 1.4.1.
    1.4.2 has the fix!

    --- Karthik N S wrote:
    Hi
    Guys

    Apologies..........



    History

    Ist type : 40000 subindexes + MultiSearcher + Search on
    Content Field
    Only for 2000 hits


    =
    Exception [ Too many Files Open ]





    IInd type : 40 Mergerd Indexes [1000 subindexes each] +
    MultiSearcher
    /ParallelSearcher + Search on Content Field Only for 20000
    hits


    =
    Exception [ OutOf Memeory ]



    System Config [same for both type]

    Amd Processor [High End Single]
    RAM 1GB
    O/s Linux ( jantoo type )
    Appserver Tomcat 5.05
    Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ]

    Index contains 15 Fields
    Search
    Done only on 1 field
    Retrieve 11 corrosponding fields
    3 Fields are for debug details


    Switched from Ist type to IInd Type

    Can some body suggest me Why is this Happening

    Thx in advance




    WITH WARM REGARDS
    HAVE A NICE DAY
    [ N.S.KARTHIK]




    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Karthik N S at Nov 10, 2004 at 8:32 am
    Hi Guy's

    Apologies .........


    I am NOT Using "sorting code"

    hits = multiSearcher.search(query, new Sort(new SortField("filename",
    SortField.STRING)));

    but using multiSearcher.search(query)

    in Core Files setup and still getting the Error.



    More Advises Required..........


    Karthik



    -----Original Message-----
    From: yahootintin-lucene@yahoo.com
    Sent: Wednesday, November 10, 2004 12:46 PM
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory


    There is a memory leak in the sorting code of Lucene 1.4.1.
    1.4.2 has the fix!

    --- Karthik N S wrote:
    Hi
    Guys

    Apologies..........



    History

    Ist type : 40000 subindexes + MultiSearcher + Search on
    Content Field
    Only for 2000 hits


    =
    Exception [ Too many Files Open ]





    IInd type : 40 Mergerd Indexes [1000 subindexes each] +
    MultiSearcher
    /ParallelSearcher + Search on Content Field Only for 20000
    hits


    =
    Exception [ OutOf Memeory ]



    System Config [same for both type]

    Amd Processor [High End Single]
    RAM 1GB
    O/s Linux ( jantoo type )
    Appserver Tomcat 5.05
    Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ]

    Index contains 15 Fields
    Search
    Done only on 1 field
    Retrieve 11 corrosponding fields
    3 Fields are for debug details


    Switched from Ist type to IInd Type

    Can some body suggest me Why is this Happening

    Thx in advance




    WITH WARM REGARDS
    HAVE A NICE DAY
    [ N.S.KARTHIK]




    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Iouli Golovatyi at Nov 10, 2004 at 8:47 am
    Exception "too many files open means":
    - searcher object is nor closed after query execution
    - too little file handlers

    Regards
    J.



    "Karthik N S"
    <karthik@controln To: "Lucene Users List" <lucene-user@jakarta.apache.org>,
    et.co.in> <yahootintin-lucene@yahoo.com>
    cc: (bcc: Iouli Golovatyi/X/GP/Novartis)
    10.11.2004 09:41 Subject: RE: Lucene1.4.1 + OutOf Memory
    Please respond to
    "Lucene Users Category: |-------------------------|
    List" | ( ) Action needed |
    ( ) Decision needed |
    ( ) General Information |
    -------------------------|





    Hi Guy's

    Apologies .........


    I am NOT Using "sorting code"

    hits = multiSearcher.search(query, new Sort(new SortField("filename",
    SortField.STRING)));

    but using multiSearcher.search(query)

    in Core Files setup and still getting the Error.



    More Advises Required..........


    Karthik



    -----Original Message-----
    From: yahootintin-lucene@yahoo.com
    Sent: Wednesday, November 10, 2004 12:46 PM
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory


    There is a memory leak in the sorting code of Lucene 1.4.1.
    1.4.2 has the fix!

    --- Karthik N S wrote:
    Hi
    Guys

    Apologies..........



    History

    Ist type : 40000 subindexes + MultiSearcher + Search on
    Content Field
    Only for 2000 hits


    =
    Exception [ Too many Files Open ]





    IInd type : 40 Mergerd Indexes [1000 subindexes each] +
    MultiSearcher
    /ParallelSearcher + Search on Content Field Only for 20000
    hits


    =
    Exception [ OutOf Memeory ]



    System Config [same for both type]

    Amd Processor [High End Single]
    RAM 1GB
    O/s Linux ( jantoo type )
    Appserver Tomcat 5.05
    Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ]

    Index contains 15 Fields
    Search
    Done only on 1 field
    Retrieve 11 corrosponding fields
    3 Fields are for debug details


    Switched from Ist type to IInd Type

    Can some body suggest me Why is this Happening

    Thx in advance




    WITH WARM REGARDS
    HAVE A NICE DAY
    [ N.S.KARTHIK]




    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org







    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Karthik N S at Nov 10, 2004 at 9:34 am
    Hi Guy's


    Apologies.........


    That's Why Somebody on the form asked me to Switch to


    : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher /
    ParallelSearcher + Search on Content Field Only for 20000

    the problem of to many Files open was solved since now there were only 40
    MergerIndexes - [1 MergerIndex has 1000 sub indexes]

    instead of 40000 subindexes.

    Now I am gettinf Out of Memory Exception.........


    Any Idea On how to Solve this problem.



    Thx in Advance






    -----Original Message-----
    From: iouli.golovatyi@group.novartis.com

    Sent: Wednesday, November 10, 2004 2:16 PM
    To: Lucene Users List
    Subject: RE: Lucene1.4.1 + OutOf Memory



    Exception "too many files open means":
    - searcher object is nor closed after query execution
    - too little file handlers

    Regards
    J.



    "Karthik N S"
    <karthik@controln To: "Lucene Users List"
    <lucene-user@jakarta.apache.org>,
    et.co.in>
    <yahootintin-lucene@yahoo.com>
    cc: (bcc: Iouli
    Golovatyi/X/GP/Novartis)
    10.11.2004 09:41 Subject: RE: Lucene1.4.1 +
    OutOf Memory
    Please respond to
    "Lucene Users Category:
    -------------------------|
    List" | ( ) Action
    needed |
    ( )
    Decision needed |
    ( ) General
    Information |
    -------------------------|





    Hi Guy's

    Apologies .........


    I am NOT Using "sorting code"

    hits = multiSearcher.search(query, new Sort(new SortField("filename",
    SortField.STRING)));

    but using multiSearcher.search(query)

    in Core Files setup and still getting the Error.



    More Advises Required..........


    Karthik



    -----Original Message-----
    From: yahootintin-lucene@yahoo.com
    Sent: Wednesday, November 10, 2004 12:46 PM
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory


    There is a memory leak in the sorting code of Lucene 1.4.1.
    1.4.2 has the fix!

    --- Karthik N S wrote:
    Hi
    Guys

    Apologies..........



    History

    Ist type : 40000 subindexes + MultiSearcher + Search on
    Content Field
    Only for 2000 hits


    =
    Exception [ Too many Files Open ]





    IInd type : 40 Mergerd Indexes [1000 subindexes each] +
    MultiSearcher
    /ParallelSearcher + Search on Content Field Only for 20000
    hits


    =
    Exception [ OutOf Memeory ]



    System Config [same for both type]

    Amd Processor [High End Single]
    RAM 1GB
    O/s Linux ( jantoo type )
    Appserver Tomcat 5.05
    Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ]

    Index contains 15 Fields
    Search
    Done only on 1 field
    Retrieve 11 corrosponding fields
    3 Fields are for debug details


    Switched from Ist type to IInd Type

    Can some body suggest me Why is this Happening

    Thx in advance




    WITH WARM REGARDS
    HAVE A NICE DAY
    [ N.S.KARTHIK]




    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org







    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Nov 10, 2004 at 9:34 am

    On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:
    Hi
    Guys

    Apologies..........
    No need to apologize for asking questions.
    History

    Ist type : 40000 subindexes + MultiSearcher + Search on Content
    Field
    You've got 40,000 indexes aggregated under a MultiSearcher and you're
    wondering why you're running out of memory?! :O
    Exception [ Too many Files Open ]
    Are you using the compound file format?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Rupinder Singh Mazara at Nov 10, 2004 at 9:39 am
    hi all

    I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am
    attaching following is the mail from Doug

    It sounds like the ThreadLocal in TermInfosReader is not getting
    correctly garbage collected when the TermInfosReader is collected.
    Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is
    that you're running in an older JVM. Is that right?

    I've attached a patch which should fix this. Please tell me if it works
    for you.

    Doug

    Daniel Taurat wrote:
    Okay, that (1.4rc3)worked fine, too!
    Got only 257 SegmentTermEnums for 1900 objects.

    Now I will go for the final test on the production server with the
    1.4rc3 version and about 40.000 objects.

    Daniel

    Daniel Taurat schrieb:
    Hi all,
    here is some update for you:
    I switched back to Lucene 1.3-final and now the number of the
    SegmentTermEnum objects is controlled by gc again:
    it goes up to about 1000 and then it is down again to 254 after
    indexing my 1900 test-objects.
    Stay tuned, I will try 1.4RC3 now, the last version before FieldCache
    was introduced...

    Daniel


    Rupinder Singh Mazara schrieb:
    hi all
    I had a similar problem, i have database of documents with 24
    fields, and a average content of 7K, with 16M+ records

    i had to split the jobs into slabs of 1M each and merging the
    resulting indexes, submissions to our job queue looked like

    java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22

    and i still had outofmemory exception , the solution that i created
    was to after every 200K, documents create a temp directory, and merge
    them together, this was done to do the first production run, updates
    are now being handled incrementally



    Exception in thread "main" java.lang.OutOfMemoryError
    at
    org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Com
    piled
    Code))
    at
    org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined
    Compiled Code))
    at
    org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined
    Compiled Code))
    at
    org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled
    Code))
    at
    org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(
    Compiled
    Code))
    at
    org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Com
    piled
    Code))
    at
    org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(
    Compiled
    Code))
    at
    org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled
    Code))
    at
    org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled
    Code))
    at
    org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
    at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
    at lucene.Indexer.main(CDBIndexer.java:168)


    -----Original Message-----
    From: Daniel Taurat
    Sent: 10 September 2004 14:42
    To: Lucene Users List
    Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
    number
    of documents


    Hi Pete,
    good hint, but we actually do have physical memory of 4Gb on the
    system. But then: we also have experienced that the gc of ibm
    jdk1.3.1 that we use is sometimes
    behaving strangely with too large heap space anyway. (Limit seems to
    be 1.2 Gb)
    I can say that gc is not collecting these objects since I forced gc
    runs when indexing every now and then (when parsing pdf-type
    objects, that is): No effect.

    regards,

    Daniel


    Pete Lewis wrote:


    Hi all

    Reading the thread with interest, there is another way I've come

    across out

    of memory errors when indexing large batches of documents.

    If you have your heap space settings too high, then you get

    swapping (which

    impacts performance) plus you never reach the trigger for garbage
    collection, hence you don't garbage collect and hence you run out

    of memory.

    Can you check whether or not your garbage collection is being
    triggered?

    Anomalously therefore if this is the case, by reducing the heap
    space you
    can improve performance get rid of the out of memory errors.

    Cheers
    Pete Lewis

    ----- Original Message ----- From: "Daniel Taurat"
    <daniel.taurat@gaussvip.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Friday, September 10, 2004 1:10 PM
    Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large

    number of

    documents





    Daniel Aber schrieb:



    On Thursday 09 September 2004 19:47, Daniel Taurat wrote:




    I am facing an out of memory problem using Lucene 1.4.1.


    Could you try with a recent CVS version? There has been a fix
    about files

    not being deleted after 1.4.1. Not sure if that could cause the
    problems
    you're experiencing.

    Regards
    Daniel



    Well, it seems not to be files, it looks more like those
    SegmentTermEnum
    objects accumulating in memory.
    #I've seen some discussion on these objects in the
    developer-newsgroup
    that had taken place some time ago.
    I am afraid this is some kind of runaway caching I have to deal with.
    Maybe not correctly addressed in this newsgroup, after all...

    Anyway: any idea if there is an API command to re-init caches?

    Thanks,

    Daniel



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    -----Original Message-----
    From: Erik Hatcher
    Sent: 10 November 2004 09:35
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory

    On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:

    Hi
    Guys

    Apologies..........
    No need to apologize for asking questions.
    History

    Ist type : 40000 subindexes + MultiSearcher + Search on Content
    Field
    You've got 40,000 indexes aggregated under a MultiSearcher and you're
    wondering why you're running out of memory?! :O
    Exception [ Too many Files Open ]
    Are you using the compound file format?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Karthik N S at Nov 10, 2004 at 11:32 am
    Hi

    Rupinder Singh Mazara

    Apologies............



    Can u Past the code on to the Mail instead of Attachement...

    [ Cause I am not bale to get the Attachement on the Company's mail ]


    Thx in advance
    Karthik


    -----Original Message-----
    From: Rupinder Singh Mazara
    Sent: Wednesday, November 10, 2004 3:10 PM
    To: Lucene Users List
    Subject: RE: Lucene1.4.1 + OutOf Memory


    hi all

    I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am
    attaching following is the mail from Doug

    It sounds like the ThreadLocal in TermInfosReader is not getting
    correctly garbage collected when the TermInfosReader is collected.
    Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is
    that you're running in an older JVM. Is that right?

    I've attached a patch which should fix this. Please tell me if it works
    for you.

    Doug

    Daniel Taurat wrote:
    Okay, that (1.4rc3)worked fine, too!
    Got only 257 SegmentTermEnums for 1900 objects.

    Now I will go for the final test on the production server with the
    1.4rc3 version and about 40.000 objects.

    Daniel

    Daniel Taurat schrieb:
    Hi all,
    here is some update for you:
    I switched back to Lucene 1.3-final and now the number of the
    SegmentTermEnum objects is controlled by gc again:
    it goes up to about 1000 and then it is down again to 254 after
    indexing my 1900 test-objects.
    Stay tuned, I will try 1.4RC3 now, the last version before FieldCache
    was introduced...

    Daniel


    Rupinder Singh Mazara schrieb:
    hi all
    I had a similar problem, i have database of documents with 24
    fields, and a average content of 7K, with 16M+ records

    i had to split the jobs into slabs of 1M each and merging the
    resulting indexes, submissions to our job queue looked like

    java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22

    and i still had outofmemory exception , the solution that i created
    was to after every 200K, documents create a temp directory, and merge
    them together, this was done to do the first production run, updates
    are now being handled incrementally



    Exception in thread "main" java.lang.OutOfMemoryError
    at
    org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Com
    piled
    Code))
    at
    org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined
    Compiled Code))
    at
    org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined
    Compiled Code))
    at
    org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled
    Code))
    at
    org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(
    Compiled
    Code))
    at
    org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Com
    piled
    Code))
    at
    org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(
    Compiled
    Code))
    at
    org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled
    Code))
    at
    org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled
    Code))
    at
    org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
    at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
    at lucene.Indexer.main(CDBIndexer.java:168)


    -----Original Message-----
    From: Daniel Taurat
    Sent: 10 September 2004 14:42
    To: Lucene Users List
    Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
    number
    of documents


    Hi Pete,
    good hint, but we actually do have physical memory of 4Gb on the
    system. But then: we also have experienced that the gc of ibm
    jdk1.3.1 that we use is sometimes
    behaving strangely with too large heap space anyway. (Limit seems to
    be 1.2 Gb)
    I can say that gc is not collecting these objects since I forced gc
    runs when indexing every now and then (when parsing pdf-type
    objects, that is): No effect.

    regards,

    Daniel


    Pete Lewis wrote:


    Hi all

    Reading the thread with interest, there is another way I've come

    across out

    of memory errors when indexing large batches of documents.

    If you have your heap space settings too high, then you get

    swapping (which

    impacts performance) plus you never reach the trigger for garbage
    collection, hence you don't garbage collect and hence you run out

    of memory.

    Can you check whether or not your garbage collection is being
    triggered?

    Anomalously therefore if this is the case, by reducing the heap
    space you
    can improve performance get rid of the out of memory errors.

    Cheers
    Pete Lewis

    ----- Original Message ----- From: "Daniel Taurat"
    <daniel.taurat@gaussvip.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Friday, September 10, 2004 1:10 PM
    Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large

    number of

    documents





    Daniel Aber schrieb:



    On Thursday 09 September 2004 19:47, Daniel Taurat wrote:




    I am facing an out of memory problem using Lucene 1.4.1.


    Could you try with a recent CVS version? There has been a fix
    about files

    not being deleted after 1.4.1. Not sure if that could cause the
    problems
    you're experiencing.

    Regards
    Daniel



    Well, it seems not to be files, it looks more like those
    SegmentTermEnum
    objects accumulating in memory.
    #I've seen some discussion on these objects in the
    developer-newsgroup
    that had taken place some time ago.
    I am afraid this is some kind of runaway caching I have to deal with.
    Maybe not correctly addressed in this newsgroup, after all...

    Anyway: any idea if there is an API command to re-init caches?

    Thanks,

    Daniel



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    -----Original Message-----
    From: Erik Hatcher
    Sent: 10 November 2004 09:35
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory

    On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:

    Hi
    Guys

    Apologies..........
    No need to apologize for asking questions.
    History

    Ist type : 40000 subindexes + MultiSearcher + Search on Content
    Field
    You've got 40,000 indexes aggregated under a MultiSearcher and you're
    wondering why you're running out of memory?! :O
    Exception [ Too many Files Open ]
    Are you using the compound file format?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Rupinder Singh Mazara at Nov 10, 2004 at 11:43 am
    karthik

    i think the core problem in your case is the use of compound files, i would
    be best to switch it off
    or alternatively issue a optimize as soon as the indexing is over.

    i am copying the file contents between <file> tags, the patch is to be
    applied on TermInfosReader.java, this
    was done to help out of memory exceptions while doing indexing
    <file>
    Index: src/java/org/apache/lucene/index/TermInfosReader.java
    ===================================================================
    RCS file:
    /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.ja
    va,v
    retrieving revision 1.9
    diff -u -r1.9 TermInfosReader.java
    --- src/java/org/apache/lucene/index/TermInfosReader.java 6 Aug 2004
    20:50:29 -0000 1.9
    +++ src/java/org/apache/lucene/index/TermInfosReader.java 10 Sep 2004
    17:46:47 -0000
    @@ -45,6 +45,11 @@
    readIndex();
    }

    + protected final void finalize() {
    + // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
    + enumerators.set(null);
    + }
    +
    public int getSkipInterval() {
    return origEnum.skipInterval;
    }
    </file>



    however tomcat does react in strange ways to to-many open files,
    try to restrict the number of IndexReader or Searchable objects
    that you create while doing searches,
    I usually keep one object to handle all my user requests

    public static Searcher fetchCitationSearcher(HttpServletRequest request)
    throws Exception {
    Searcher rval = (Searcher)
    request.getSession().getServletContext().getAttribute(
    "luceneSearchable");
    if (rval == null) {
    rval = new IndexSearcher( fetchCitationReader(request) );

    request.getSession().getServletContext().setAttribute("luceneSearchable",
    rval);
    }
    return rval;
    }



    -----Original Message-----
    From: Karthik N S
    Sent: 10 November 2004 11:41
    To: Lucene Users List
    Subject: RE: Lucene1.4.1 + OutOf Memory


    Hi

    Rupinder Singh Mazara

    Apologies............



    Can u Past the code on to the Mail instead of Attachement...

    [ Cause I am not bale to get the Attachement on the Company's mail ]


    Thx in advance
    Karthik


    -----Original Message-----
    From: Rupinder Singh Mazara
    Sent: Wednesday, November 10, 2004 3:10 PM
    To: Lucene Users List
    Subject: RE: Lucene1.4.1 + OutOf Memory


    hi all

    I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am
    attaching following is the mail from Doug

    It sounds like the ThreadLocal in TermInfosReader is not getting
    correctly garbage collected when the TermInfosReader is collected.
    Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is
    that you're running in an older JVM. Is that right?

    I've attached a patch which should fix this. Please tell me if it works
    for you.

    Doug

    Daniel Taurat wrote:
    Okay, that (1.4rc3)worked fine, too!
    Got only 257 SegmentTermEnums for 1900 objects.

    Now I will go for the final test on the production server with the
    1.4rc3 version and about 40.000 objects.

    Daniel

    Daniel Taurat schrieb:
    Hi all,
    here is some update for you:
    I switched back to Lucene 1.3-final and now the number of the
    SegmentTermEnum objects is controlled by gc again:
    it goes up to about 1000 and then it is down again to 254 after
    indexing my 1900 test-objects.
    Stay tuned, I will try 1.4RC3 now, the last version before FieldCache
    was introduced...

    Daniel


    Rupinder Singh Mazara schrieb:
    hi all
    I had a similar problem, i have database of documents with 24
    fields, and a average content of 7K, with 16M+ records

    i had to split the jobs into slabs of 1M each and merging the
    resulting indexes, submissions to our job queue looked like

    java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22

    and i still had outofmemory exception , the solution that i created
    was to after every 200K, documents create a temp directory, and merge
    them together, this was done to do the first production run, updates
    are now being handled incrementally



    Exception in thread "main" java.lang.OutOfMemoryError
    at
    org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream
    .java(Com
    piled
    Code))
    at
    org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined
    Compiled Code))
    at
    org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined
    Compiled Code))
    at
    org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled
    Code))
    at
    org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWri
    ter.java(
    Compiled
    Code))
    at
    org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter
    .java(Com
    piled
    Code))
    at
    org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMer
    ger.java(
    Compiled
    Code))
    at
    org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled
    Code))
    at
    org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled
    Code))
    at
    org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
    at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
    at lucene.Indexer.main(CDBIndexer.java:168)


    -----Original Message-----
    From: Daniel Taurat
    Sent: 10 September 2004 14:42
    To: Lucene Users List
    Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
    number
    of documents


    Hi Pete,
    good hint, but we actually do have physical memory of 4Gb on the
    system. But then: we also have experienced that the gc of ibm
    jdk1.3.1 that we use is sometimes
    behaving strangely with too large heap space anyway. (Limit seems to
    be 1.2 Gb)
    I can say that gc is not collecting these objects since I forced gc
    runs when indexing every now and then (when parsing pdf-type
    objects, that is): No effect.

    regards,

    Daniel


    Pete Lewis wrote:


    Hi all

    Reading the thread with interest, there is another way I've come

    across out

    of memory errors when indexing large batches of documents.

    If you have your heap space settings too high, then you get

    swapping (which

    impacts performance) plus you never reach the trigger for garbage
    collection, hence you don't garbage collect and hence you run out

    of memory.

    Can you check whether or not your garbage collection is being
    triggered?

    Anomalously therefore if this is the case, by reducing the heap
    space you
    can improve performance get rid of the out of memory errors.

    Cheers
    Pete Lewis

    ----- Original Message ----- From: "Daniel Taurat"
    <daniel.taurat@gaussvip.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Friday, September 10, 2004 1:10 PM
    Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large

    number of

    documents





    Daniel Aber schrieb:



    On Thursday 09 September 2004 19:47, Daniel Taurat wrote:




    I am facing an out of memory problem using Lucene 1.4.1.


    Could you try with a recent CVS version? There has been a fix
    about files

    not being deleted after 1.4.1. Not sure if that could cause the
    problems
    you're experiencing.

    Regards
    Daniel



    Well, it seems not to be files, it looks more like those
    SegmentTermEnum
    objects accumulating in memory.
    #I've seen some discussion on these objects in the
    developer-newsgroup
    that had taken place some time ago.
    I am afraid this is some kind of runaway caching I have to
    deal with.
    Maybe not correctly addressed in this newsgroup, after all...

    Anyway: any idea if there is an API command to re-init caches?

    Thanks,

    Daniel


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    -----Original Message-----
    From: Erik Hatcher
    Sent: 10 November 2004 09:35
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory

    On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:

    Hi
    Guys

    Apologies..........
    No need to apologize for asking questions.
    History

    Ist type : 40000 subindexes + MultiSearcher + Search on Content
    Field
    You've got 40,000 indexes aggregated under a MultiSearcher and you're
    wondering why you're running out of memory?! :O
    Exception [ Too many Files Open ]
    Are you using the compound file format?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Karthik N S at Nov 10, 2004 at 11:30 am
    Hi Guy's


    Apologies......


    Yes Erik

    The Day I switched from Lucene1.3.1 to Lucene1.4.1 We are using the
    CompoundFile format to


    writer.setUseCompoundFile(true);


    Some More Advises Please.


    Thx in advance

    -----Original Message-----
    From: Erik Hatcher
    Sent: Wednesday, November 10, 2004 3:05 PM
    To: Lucene Users List
    Subject: Re: Lucene1.4.1 + OutOf Memory

    On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:

    Hi
    Guys

    Apologies..........
    No need to apologize for asking questions.
    History

    Ist type : 40000 subindexes + MultiSearcher + Search on Content
    Field
    You've got 40,000 indexes aggregated under a MultiSearcher and you're
    wondering why you're running out of memory?! :O
    Exception [ Too many Files Open ]
    Are you using the compound file format?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Related Discussions

    Discussion Navigation
    viewthread | post
    Discussion Overview
    groupjava-user @
    categorieslucene
    postedNov 10, '04 at 6:46a
    activeNov 10, '04 at 11:43a
    posts10
    users5
    websitelucene.apache.org

    People

    Translate

    site design / logo © 2022 Grokbase