Grokbase Groups Lucene dev July 2008
FAQ
That is correct.

ANY library that uses thread locals is going to have the same issue,
and in many cases the containers themselves use static thread locals.

This is just another reason why (I think) a better solution when
using Lucene as a service is to run it in its own process, and
communicate with the process from the web container. You have better
memory/request control - all of the recent memory caching
enhancements are worthless when running Lucene along side other apps
in the same web container - future JSRs might make this feasible, but
not yet. It allows for better process monitoring, etc. So many
advantages - very few downsides


On Jul 14, 2008, at 8:38 AM, Roman Puchkovskiy wrote:


1. It is not obligatory for servlet container to deploy
applications from the
same thread. It easily can happen, that first instance of webapp is
loaded
by the main thread at container start-up, then repedloy happens
because a
hot deployer discovers that WAR has changed (and this hot deployer
easily
may work in another thread, not in the main one), and after that
the webapp
is redeployed in response to HTTP request (Tomcat manager is
example, which
is webapp itself and may redeploy other applications). Here a bunch of
threads deploy webapp, and they are all different.
2. What if a user just wants to undeploy the webapp, without the
redeploy?
He expects the memory to be released, but it will not be.


Robert Engels wrote:
If you attempting to restart the web app context, then when restarted
the Thread will still be the same and as it runs, new ThreadLocals
will be created which will cause the old to be removed, eventually
allowing the class loader to be unloaded.
On Jul 14, 2008, at 12:31 AM, Roman Puchkovskiy wrote:


Oops, sent too early, excuse me :) One more to say: yes, Yonik is
correct.
Values remain strong-referenced until the stale values are removed
from the
table. Just a quote from the ThreadLocals' javadoc:
"However, since reference queues are not used, stale entries are
guaranteed
to be removed only when the table starts running out of space."
How do you suggest to force the removal of stale entries to the
Lucene user?


Robert Engels wrote:
You are mistaken - Yonik's comment in that thread is correct
(although it is not just at table resize - any time a
ThreadLocal is
added, and any time the ThreadLocal is not found in its direct hash
it might clear entries).

The ThreadLocals map only has a WeakReference to the
ThreadLocal, so
if that is the only reference, it will be GC'd - eventually, and
then
it will be cleared as new ThreadLocals are created.

With a static reference, the thread can reference the
ThreadLocal at
any time, and thus the WeakReference will not be cleared.

If the object is VERY large, and new ThreadLocals are not
created it
could cause a problem, but I don't think that is the case with
Lucene, as the objects stored in ThreadLocals are designed to live
for the life of the SegmentReader/IndexReader and thread.

On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:


Well, possibly I'm mistaken, but it seems that this affects non-
static fields
too. Please see
http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html
where the
use case is described in the details.
In short: it seems that the scope of ThreadLocals does not matter.
What
really matters is that they are referenced by ThreadLocals map in
the thread
which is still alive.


Robert Engels wrote:
This is only an issue for static ThreadLocals ...
On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:


The problem here is not because ThreadLocal instances are not
GC'd
(they are
GC'd, and your test shows this clearly).
But even one instance which is not removed from its Thread is
enough to
prevent the classloader from being unloaded, and that's the
problem.


Michael McCandless-2 wrote:
OK, I created a simple test to test this (attached). The test
just
runs 10 threads, each one creating a 100 KB byte array which is
stored
into a ThreadLocal, and then periodically the ThreadLocal is
replaced
with a new one. This is to test whether GC of a ThreadLocal,
even
though the thread is still alive, in fact leads to GC of the
objects
held in the ThreadLocal.

Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects
are in
fact properly collected.

So this is not a leak but rather a "delayed collection" issue.
Java's
GC is never guaranteed to be immediate, and apparently when
using
ThreadLocals it's even less immediate than "normal". In the
original
issue, if other things create ThreadLocals, then eventually
Lucene's
unreferenced ThreadLocals would be properly collected.

So I think we continue to use non-static ThreadLocals in
Lucene...

Mike






robert engels wrote:
Once again, these are "static" thread locals. A completely
different
issue. Since the object is available statically, the weak
reference
cannot be cleared so stale entries will never be cleared as
long as
the thread is alive.
On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:

Just a few examples of "problems" using ThreadLocals.

http://opensource.atlassian.com/projects/hibernate/browse/
HHH-2481
http://www.theserverside.com/news/thread.tss?thread_id=41473

Once again, I'm not pointing to Lucene SegmentReader as a
"bad"
implementation, and maybe the current "problems" of
ThreadLocals
are not a problem for SegmentReader but it seems safer to use
ThreadLocals to pass context information which is cleared
when
the
call exits instead of storing long-lived objects.


robert engels wrote:
Aside from the pre-1.5 thread local "perceived leak", there
are no
issues with ThreadLocals if used properly.

There is no need for try/finally blocks, unless you MUST
release
resources immediately, usually this is not the case,
which is
why
a ThreadLocal is used in the first place.

From the ThreadLocalMap javadoc...

/**
* ThreadLocalMap is a customized hash map suitable
only for
* maintaining thread local values. No operations are
exported
* outside of the ThreadLocal class. The class is
package
private to
* allow declaration of fields in class Thread. To help
deal
with
* very large and long-lived usages, the hash table
entries
use
* WeakReferences for keys. However, since reference
queues
are not
* used, stale entries are guaranteed to be removed only
when
* the table starts running out of space.
*/

/**
* Heuristically scan some cells looking for stale
entries.
* This is invoked when either a new element is
added, or
* another stale one has been expunged. It
performs a
* logarithmic number of scans, as a balance
between no
* scanning (fast but retains garbage) and a
number of
scans
* proportional to number of elements, that would
find all
* garbage but would cause some insertions to take O
(n)
time.
*
* @param i a position known NOT to hold a stale
entry.
The
* scan starts at the element after i.
*
* @param n scan control: <tt>log2(n)</tt> cells are
scanned,
* unless a stale entry one is found, in which case
* <tt>log2(table.length)-1</tt> additional cells
are
scanned.
* When called from insertions, this parameter is
the
number
* of elements, but when from replaceStaleEntry, it
is the
* table length. (Note: all this could be changed
to be
either
* more or less aggressive by weighting n instead of
just
* using straight log n. But this version is simple,
fast,
and
* seems to work well.)
*
* @return true if any stale entries have been
removed.
*/


The instance ThreadLocals (and what the refer to) will be
GC'd
when the containing Object is GC'd.

There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal
refers
to an object that has native resources (e.g. file handles),
it may
not be released until other thread locals are created by the
thread (or the thread terminates).

You can avoid this "delay" by calling remove(), but in most
applications it should never be necessary - unless a very
strange
usage...
On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:

From what I know, storing objects in ThreadLocal is safe as
long
as you release the object within a try {} finall {}
block or
store objects which are independent of the rest of the code
(no
dependencies).Otherwise it can get pretty tricky(memory
leaks,
classloader problems) after awhile.

It is pretty convenient to pass HTTP request information
with a
ThreadLocal in a servlet(but you should cleanup the
variable
before leaving the servlet) but I'm not sure how safe it
is in
this case.

robert engels wrote:
Using synchronization is a poor/invalid substitute for
thread
locals in many cases.

The point of the thread local in these referenced cases
is too
allow streaming reads on a file descriptor. if you use a
shared
file descriptor/buffer you are going to continually
invalidate
the buffer.
On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:


Well ... SegmentReader uses ThreadLocal to hold a thread-
private instance of TermVectorsReader, to avoid
synchronizing
per-document when loading term vectors.

Clearing this ThreadLocal value per call to
SegmentReader's
methods that load term vectors would defeat its purpose.

Though, of course, we then synchronize on the underlying
file
(when using FSDirectory), so perhaps we are really not
saving
much by using ThreadLocal here. But we are looking to
relax
that low level synchronization with LUCENE-753.

Maybe we could make our own ThreadLocal that just uses a
HashMap, which we'd have to synchronize on when
getting the
per-
thread instances. Or, go back to sharing a single
TermVectorsReader and synchronize per-document.

Jason has suggested moving to a model where you ask the
IndexReader for an object that can return term vectors /
stored
fields / etc, and then you interact with that many
times to
retrieve each doc. We could then synchronize only on
retrieving that object, and provide a thread-private
instance.

It seems like we should move away from using
ThreadLocal in
Lucene and do "normal" synchronization instead.

Mike

Adrian Tarau wrote:
Usually ThreadLocal.remove() should be called at the end
(in a
finally block), before the current call leaves your
code.

Ex : if during searching ThreadLocal is used, every
search
(..)
method should cleanup any ThreadLocal variables, or even
deeper in the implementation. When the call leaves
Lucene
any
used ThreadLocal should be cleaned up.

Michael McCandless wrote:
ThreadLocal, which we use in several places in Lucene,
causes
a leak in app servers because the classloader never
fully
deallocates Lucene's classes because the ThreadLocal is
holding strong references.

Yet, ThreadLocal is very convenient for avoiding
synchronization.

Does anyone have any ideas on how to solve this w/o
falling
back to "normal" synchronization?

Mike

Begin forwarded message:
From: "Yonik Seeley" <yonik@apache.org>
Date: July 7, 2008 3:30:28 PM EDT
To: java-user@lucene.apache.org
Subject: Re: ThreadLocal in SegmentReader
Reply-To: java-user@lucene.apache.org

On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
wrote:
So now I'm confused: the SegmentReader itself
should no
longer be reachable,
assuming you are not holding any references to your
IndexReader.

Which means the ThreadLocal instance should no
longer be
reachable.
It will still be referenced from the Thread(s)
ThreadLocalMap
The key (the ThreadLocal) will be weakly referenced,
but the
values
(now stale) are strongly referenced and won't be
actually
removed
until the table is resized (under the Java6 impl at
least).
Nice huh?

-Yonik

------------------------------------------------------
--
--
--
---------
To unsubscribe, e-mail: java-user-
unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-
help@lucene.apache.org

-------------------------------------------------------
--
--
--
--------
To unsubscribe, e-mail: java-dev-
unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-
help@lucene.apache.org

--------------------------------------------------------
--
--
--
-------
To unsubscribe, e-mail: java-dev-
unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-
help@lucene.apache.org

---------------------------------------------------------
--
--
--
------
To unsubscribe, e-mail: java-dev-
unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-
help@lucene.apache.org

----------------------------------------------------------
--
--
--
-----
To unsubscribe, e-mail: java-dev-
unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-
help@lucene.apache.org

-----------------------------------------------------------
--
--
--
----
To unsubscribe, e-mail: java-dev-
unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-
help@lucene.apache.org


---------------------------------------------------------------
--
--
--
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-
help@lucene.apache.org
--
View this message in context: http://www.nabble.com/Fwd%3A-
ThreadLocal-in-SegmentReader-tp18326720p18416022.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.


----------------------------------------------------------------
--
--
-
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


-----------------------------------------------------------------
--
--
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

--
View this message in context: http://www.nabble.com/Fwd%3A-
ThreadLocal-in-SegmentReader-tp18326720p18416841.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.


------------------------------------------------------------------
--
-
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

-------------------------------------------------------------------
--
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

--
View this message in context: http://www.nabble.com/Fwd%3A-
ThreadLocal-in-SegmentReader-tp18326720p18437716.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.


--------------------------------------------------------------------
-
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

--
View this message in context: http://www.nabble.com/Fwd%3A-
ThreadLocal-in-SegmentReader-tp18326720p18444095.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 22 of 22 | next ›
Discussion Overview
groupdev @
categorieslucene
postedJul 7, '08 at 9:10p
activeJul 14, '08 at 1:55p
posts22
users5
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase