FAQ
Thanks for the feedback guys. The evidence I have collected does point to an
issue either in the java WeakHashMap implementation or in Lucene's use of
it. In particular, I used reflection to replace the WeakHashMap instances
with my own dummy Map that does a no-op for the put operation, and although
this caused tons more garbage to be created for each search query the CMS
executions always collected that garbage and brought down the tenured space
usage to a small value.

Before that, I tried calling clear() on the WeakHashMap instances and then
calling size() which calls expungeStaleEntries() but that didn't help. There
may be indeed be a case of double memory usage for a while (until enough CMS
executions have run) as Tom pointed out, but I don't fully understand what's
happenning yet.

Also, bizzarely during my introspection of the readerCache WeakHashMap
instances I found an instance of
org.apache.lucene.index.CompoundFileReader$CSIndexInput as a key. Looking at
the code I don't see how anything other than an IndexReader could possibly
get there, and this class certainly doesn't extend IndexReader. Any ideas?
:-) Maybe I need to get more sleep.

Btw, is searching with a sort by a text field not a common use-case of
lucene? I've been testing with only 1Gb indexes and I'm pretty sure there
are much larger indexes out there.

Cheers,
TCK




On Mon, Dec 7, 2009 at 7:57 PM, Tom Hill wrote:

Hey, that's a nice little Class! I hadn't see it before. But it sounds like
the asynchronous cleanup might deal with the problem I mentioned above (but
I haven't looked at the code yet).

It's an apache license - but you mentioned something about no third party
libraries. Is that a policy for Lucene?

Thanks,

Tom


On Mon, Dec 7, 2009 at 4:44 PM, Jason Rutherglen <
jason.rutherglen@gmail.com
wrote:
I wonder if Google Collections (even though we don't use third party
libraries) concurrent map, which supports weak keys, handles the
removal of weakly referenced keys in a more elegant way than Java's
WeakHashMap?
On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill wrote:
Hi -

If I understand correctly, WeakHashMap does not free the memory for the
value (cached data) when the key is nulled, or even when the key is garbage
collected.

It requires one more step: a method on WeakHashMap must be called to allow
it to release its hard reference to the cached data. It appears that
most
methods in WeakHashMap end up calling expungeStaleEntries, which will clear
the hard reference. But you have to call some method on the map, before the
memory is eligible for garbage collection.

So it requires four stages to free the cached data. Null the key; A GC
to
release the weak reference to the key; A call to some method on the
map;
Then the next GC cycle should free the value.

So it seems possible that you could end up with double memory usage for
a
time. If you don't have a GC between the time that you close the old reader,
and you start to load the field cache entry for the next reader, then
the
key may still be hanging around uncollected.

At that point, it may run a GC when you allocate the new cache, but that's
only the first GC. It can't free the cached data until after the next call
to expungeStaleEntries, so for a while you have both caches around.

This extra usage could cause things to move into tenured space. Could this
be causing your problem?

Workaround would be to cause some method to be called on the
WeakHashMap.
You don't want to call get(), since that will try to populate the
cache.
Maybe if you tried putting a small value to the cache, and doing a GC, and
see if your memory drops then.


Tom


On Mon, Dec 7, 2009 at 1:48 PM, TCK wrote:

Thanks for the response. But I'm definitely calling close() on the old
reader and opening a new one (not using reopen). Also, to simplify the
analysis, I did my test with a single-threaded requester to eliminate
any
concurrency issues.

I'm doing:
sSearcher.getIndexReader().close();
sSearcher.close(); // this actually seems to be a no-op
IndexReader newIndexReader = IndexReader.open(newDirectory);
sSearcher = new IndexSearcher(newIndexReader);

Btw, isn't it bad practice anyway to have an unbounded cache? Are
there
any
plans to replace the HashMaps used for the innerCaches with an actual
size-bounded cache with some eviction policy (perhaps EhCache or
something)
?

Thanks again,
TCK




On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson <
erickerickson@gmail.com
wrote:
What this sounds like is that you're not really closing your
readers even though you think you are. Sorting indeed uses up
significant memory when it populates internal caches and keeps
it around for later use (which is one of the reasons that warming
queries matter). But if you really do close the reader, I'm pretty
sure the memory should be GC-able.

One thing that trips people up is IndexReader.reopen(). If it
returns a reader different than the original, you *must* close the
old one. If you don't, the old reader is still hanging around and
memory won't be returne.... An example from the Javadocs...

IndexReader reader = ...
...
IndexReader new = r.reopen();
if (new != reader) {
... // reader was reopened
reader.close();
}
reader = new;
...


If this is irrelevant, could you post your close/open

code?

HTH

Erick


On Mon, Dec 7, 2009 at 4:27 PM, TCK <moonwatcher32329@gmail.com>
wrote:
Hi,
I'm having heap memory issues when I do lucene queries involving
sorting
by
a string field. Such queries seem to load a lot of data in to the
heap.
Moreover lucene seems to hold on to references to this data even
after
the
index reader has been closed and a full GC has been run. Some of
the
consequences of this are that in my generational heap
configuration
a
lot
of
memory gets promoted to tenured space each time I close the old
index
reader
and after opening and querying using a new one, and the tenured
space
eventually gets fragmented causing a lot of promotion failures
resulting
in
jvm hangs while the jvm does stop-the-world GCs.

Does anyone know any workarounds to avoid these memory issues when
doing
such lucene queries?

My profiling showed that even after a full GC lucene is holding on
to a
lot
of references to field value data notably via the
FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the
WeakHashMap
readerCaches are using unbounded HashMaps as the innerCaches and I
used
reflection to replace these innerCaches with dummy empty HashMaps,
but
still
I'm seeing the same behavior. I wondered if anyone has gone
through
these
same issues before and would offer any advice.

Thanks a lot,
TCK
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 22 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedDec 7, '09 at 9:27p
activeDec 17, '09 at 10:34a
posts22
users7
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase