I'm having heap memory issues when I do lucene queries involving sorting by
a string field. Such queries seem to load a lot of data in to the heap.
Moreover lucene seems to hold on to references to this data even after the
index reader has been closed and a full GC has been run. Some of the
consequences of this are that in my generational heap configuration a lot of
memory gets promoted to tenured space each time I close the old index reader
and after opening and querying using a new one, and the tenured space
eventually gets fragmented causing a lot of promotion failures resulting in
jvm hangs while the jvm does stop-the-world GCs.
Does anyone know any workarounds to avoid these memory issues when doing
such lucene queries?
My profiling showed that even after a full GC lucene is holding on to a lot
of references to field value data notably via the
FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the WeakHashMap
readerCaches are using unbounded HashMaps as the innerCaches and I used
reflection to replace these innerCaches with dummy empty HashMaps, but still
I'm seeing the same behavior. I wondered if anyone has gone through these
same issues before and would offer any advice.
Thanks a lot,