Grokbase Groups Lucene dev July 2008
FAQ
As stated in the cited email, they use a static ThreadLocal - this is
a different issue (and I don't think Lucene uses any of these).

It is still not a bug in this case, but just the same "delayed"
release behavior. The problem is that the some class loaders
probably are never released as the Threads are never terminated, yet
they do not create any more ThreadLocals (thus causing some stale
slots to never be cleared).

I am not sure that Lucene has many long-lived objects... maybe
SegmentReaders if the index rarely changes/updated.

We have done extensive memory profiling of Lucene - there are no
inherent leaks (as of the 1.9 code set). I don't think anything has
changed since then related to this (expect using some
FixedThreadLocal to improve the situation in some cases).

If you search the archives for FixedThreadLocal you can find a simple
implementation that solves the perceived problem.

On Jul 9, 2008, at 3:59 PM, Adrian Tarau wrote:

As soon as the object can be collected...which doesn't happen in
case of (some) thread pools.

Here is an example : http://www.opensubscriber.com/message/ojb-
user@db.apache.org/1500135.html (it's pretty old but you can find
newer examples googling around). There are some replies with
workarounds.

I'm not saying there is something wrong using ThreadLocal in
Lucene, but it could cause problems when classloaders have to be
released, as Michael noticed.

As Jason suggested, a global manager for ThreadLocals could be the
solution.
I prefer to use ThreadLocal to pass "context" objects but I didn't
use them to store long-lived objects. I'm always releasing the
object after it was used. I'm not sure if is too preventive, but I
had some similar problems with web applications(redeploy),
ThreadLocals and long-lived objects.

robert engels wrote:
Aside from the pre-1.5 thread local "perceived leak", there are no
issues with ThreadLocals if used properly.

There is no need for try/finally blocks, unless you MUST release
resources immediately, usually this is not the case, which is why
a ThreadLocal is used in the first place.

From the ThreadLocalMap javadoc...

/**
* ThreadLocalMap is a customized hash map suitable only for
* maintaining thread local values. No operations are exported
* outside of the ThreadLocal class. The class is package
private to
* allow declaration of fields in class Thread. To help deal
with
* very large and long-lived usages, the hash table entries use
* WeakReferences for keys. However, since reference queues
are not
* used, stale entries are guaranteed to be removed only when
* the table starts running out of space.
*/

/**
* Heuristically scan some cells looking for stale entries.
* This is invoked when either a new element is added, or
* another stale one has been expunged. It performs a
* logarithmic number of scans, as a balance between no
* scanning (fast but retains garbage) and a number of scans
* proportional to number of elements, that would find all
* garbage but would cause some insertions to take O(n) time.
*
* @param i a position known NOT to hold a stale entry. The
* scan starts at the element after i.
*
* @param n scan control: <tt>log2(n)</tt> cells are scanned,
* unless a stale entry one is found, in which case
* <tt>log2(table.length)-1</tt> additional cells are
scanned.
* When called from insertions, this parameter is the number
* of elements, but when from replaceStaleEntry, it is the
* table length. (Note: all this could be changed to be
either
* more or less aggressive by weighting n instead of just
* using straight log n. But this version is simple, fast,
and
* seems to work well.)
*
* @return true if any stale entries have been removed.
*/


The instance ThreadLocals (and what the refer to) will be GC'd
when the containing Object is GC'd.

There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers
to an object that has native resources (e.g. file handles), it may
not be released until other thread locals are created by the
thread (or the thread terminates).

You can avoid this "delay" by calling remove(), but in most
applications it should never be necessary - unless a very strange
usage...
On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:

From what I know, storing objects in ThreadLocal is safe as long
as you release the object within a try {} finall {} block or
store objects which are independent of the rest of the code(no
dependencies).Otherwise it can get pretty tricky(memory leaks,
classloader problems) after awhile.

It is pretty convenient to pass HTTP request information with a
ThreadLocal in a servlet(but you should cleanup the variable
before leaving the servlet) but I'm not sure how safe it is in
this case.

robert engels wrote:
Using synchronization is a poor/invalid substitute for thread
locals in many cases.

The point of the thread local in these referenced cases is too
allow streaming reads on a file descriptor. if you use a shared
file descriptor/buffer you are going to continually invalidate
the buffer.
On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:


Well ... SegmentReader uses ThreadLocal to hold a thread-
private instance of TermVectorsReader, to avoid synchronizing
per-document when loading term vectors.

Clearing this ThreadLocal value per call to SegmentReader's
methods that load term vectors would defeat its purpose.

Though, of course, we then synchronize on the underlying file
(when using FSDirectory), so perhaps we are really not saving
much by using ThreadLocal here. But we are looking to relax
that low level synchronization with LUCENE-753.

Maybe we could make our own ThreadLocal that just uses a
HashMap, which we'd have to synchronize on when getting the per-
thread instances. Or, go back to sharing a single
TermVectorsReader and synchronize per-document.

Jason has suggested moving to a model where you ask the
IndexReader for an object that can return term vectors / stored
fields / etc, and then you interact with that many times to
retrieve each doc. We could then synchronize only on
retrieving that object, and provide a thread-private instance.

It seems like we should move away from using ThreadLocal in
Lucene and do "normal" synchronization instead.

Mike

Adrian Tarau wrote:
Usually ThreadLocal.remove() should be called at the end(in a
finally block), before the current call leaves your code.

Ex : if during searching ThreadLocal is used, every search(..)
method should cleanup any ThreadLocal variables, or even
deeper in the implementation. When the call leaves Lucene any
used ThreadLocal should be cleaned up.

Michael McCandless wrote:
ThreadLocal, which we use in several places in Lucene, causes
a leak in app servers because the classloader never fully
deallocates Lucene's classes because the ThreadLocal is
holding strong references.

Yet, ThreadLocal is very convenient for avoiding
synchronization.

Does anyone have any ideas on how to solve this w/o falling
back to "normal" synchronization?

Mike

Begin forwarded message:
From: "Yonik Seeley" <yonik@apache.org>
Date: July 7, 2008 3:30:28 PM EDT
To: java-user@lucene.apache.org
Subject: Re: ThreadLocal in SegmentReader
Reply-To: java-user@lucene.apache.org

On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
wrote:
So now I'm confused: the SegmentReader itself should no
longer be reachable,
assuming you are not holding any references to your
IndexReader.

Which means the ThreadLocal instance should no longer be
reachable.
It will still be referenced from the Thread(s) ThreadLocalMap
The key (the ThreadLocal) will be weakly referenced, but the
values
(now stale) are strongly referenced and won't be actually
removed
until the table is resized (under the Java6 impl at least).
Nice huh?

-Yonik

---------------------------------------------------------------
------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-
help@lucene.apache.org

----------------------------------------------------------------
-----
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

-----------------------------------------------------------------
----
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

------------------------------------------------------------------
---
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

-------------------------------------------------------------------
--
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

--------------------------------------------------------------------
-
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 22 | next ›
Discussion Overview
groupdev @
categorieslucene
postedJul 7, '08 at 9:10p
activeJul 14, '08 at 1:55p
posts22
users5
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase