FAQ
Sorry, but I am fairly certain you are mistaken.

If you only have a single IndexReader, the RAMDirectory will be
shared in all cases.

The only memory growth is any buffer space allocated by an IndexInput
(used in many places and cached).

Normally the IndexInput created by a RAMDirectory do not have any
buffer allocated, since the underlying store is already in memory.

You have some other problem in your code...
On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are
cached via the ThreadLocal cache, and can not be released unless
all threads do the close action.

SegmentTermEnum itself is small, but it holds RAMDirectory along
the path, which is big.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got 2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 10:43 PM, robert engels
wrote:
You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class that
ultimately holds the reference.

If the IndexReader is never closed, the SegmentReader(s) is never
closed, so the thread local in TermInfosReader is not cleared
(because the thread never dies). So you will get one
SegmentTermEnum, per thread * per segment.

The SegmentTermEnum is not a large object, so even if you had 100
threads, and 100 segments, for 10k instances, seems hard to believe
that is the source of your memory issue.

The SegmentTermEnum is cached by thread since it needs to enumerate
the terms, not having a per thread cache, would lead to lots of
random access when multiple threads read the index - very slow.

You need to keep in mind, what if every thread was executing a
search simultaneously - you would still have 100x100
SegmentTermEnum instances anyway ! The only way to prevent that
would be to create and destroy the SegmentTermEnum on each call
(opening and seeking to the proper spot) - which would be SLOW SLOW
SLOW.
On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create
searcher. But the memory leak is the same. It's not related to the
Searcher class specifically, but the SegmentTermEnum in
TermInfosReader.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got 2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 10:14 PM, robert engels
wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,
not a Searcher. And searchers can share an IndexReader.

You want to create a single shared (across all threads/users)
IndexReader (usually), and create an Searcher as needed and
dispose. It is VERY CHEAP to create the Searcher.

I am fairly certain the javadoc on Searcher is incorrect. The
warning "For performance reasons it is recommended to open only
one IndexSearcher and use it for all of your searches" is not true
in the case where an IndexReader is passed to the ctor.

Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the "path" ctor, and that is the source
of your problems, as multiple IndexReader instances are being
created, and thus the memory use.

On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with
several searchers open.
The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got 2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 9:03 PM, robert engels
wrote:
You need to close the searcher within the thread that is using
it, in order to have it cleaned up quickly... usually right after
you display the page of results.

If you are keeping multiple searcher refs across multiple threads
for paging/whatever, you have not coded it correctly.

Imagine 10,000 users - storing a searcher for each one is not
going to work...
On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

Right, in a sense I can not release it from another thread. But
that's the problem.

It's a J2EE environment, all threads are kind of equal. It's
simply not possible to iterate through all threads to close the
searcher, thus releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE environment, this has
to be fixed.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 8:14 PM, robert engels
wrote:
Your code is not correct. You cannot release it on another
thread - the first thread may creating hundreds/thousands of
instances before the other thread ever runs...
On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

If I release it on the thread that's creating the searcher, by
setting searcher=null, everything is fine, the memory is
released very cleanly.
My load test was to repeatedly create a searcher on a
RAMDirectory and release it on another thread. The test will
quickly go to OOM after several runs. I set the heap size to be
1024M, and the RAMDirectory is of size 250M. Using some
profiling tool, the used size simply stepped up pretty
obviously by 250M.

I think we should not rely on something that's a "maybe"
behavior, especially for a general purpose library.

Since it's a multi-threaded env, the thread that's creating the
entries in the LRU cache may not go away quickly(actually most,
if not all, application servers will try to reuse threads), so
the LRU cache, which uses thread as the key, can not be
released, so the SegmentTermEnum which is in the same class can
not be released.

And yes, I close the RAMDirectory, and the fileMap is released.
I verified that through the profiler by directly checking the
values in the snapshot.

Pretty sure the reference tree wasn't like this using code
before this commit, because after close the searcher in another
thread, the RAMDirectory totally disappeared from the memory
snapshot.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got 2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless
wrote:

Chris Lu wrote:

The problem should be similar to what's talked about on this
discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

The "rough" conclusion of that thread is that, technically,
this isn't a memory leak but rather a "delayed freeing"
problem. Ie, it may take longer, possibly much longer, than
you want for the memory to be freed.


There is a memory leak for Lucene search from Lucene-1195.(svn
r659602, May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

One thing that confuses me: TermInfosReader was already using a
ThreadLocal to cache the SegmentTermEnum instance. What was
added in this commit (for LUCENE-1195) was an LRU cache storing
Term -> TermInfo instances. But it seems like it's the
SegmentTermEnum instance that you're tracing below.


It's usually recommended to keep the reader open, and reuse it
when
possible. In a common J2EE application, the http requests are
usually
handled by different threads. But since the cache is
ThreadLocal, the cache
are not really usable by other threads. What's worse, the cache
can not be
cleared by another thread!

This leak is not so obvious usually. But my case is using
RAMDirectory,
having several hundred megabytes. So one un-released resource
is obvious to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
- directory of org.apache.lucene.store.RAMFile
- file of org.apache.lucene.store.RAMInputStream
- base of org.apache.lucene.index.CompoundFileReader
$CSIndexInput
- input of org.apache.lucene.index.SegmentTermEnum
- value of java.lang.ThreadLocal$ThreadLocalMap
$Entry

So you have a RAMDir that has several hundred MB stored in it,
that you're done with yet through this path Lucene is keeping
it alive?

Did you close the RAMDir? (which will null its fileMap and
should also free your memory).

Also, that reference tree doesn't show the ThreadResources
class that was added in that commit -- are you sure this
reference tree wasn't before the commit?

Mike

------------------------------------------------------------------
---
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org




--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got 2.6 Million Euro funding!


Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 16 of 55 | next ›
Discussion Overview
groupjava-dev @
categorieslucene
postedSep 9, '08 at 6:58p
activeSep 14, '08 at 3:12a
posts55
users6
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase