|
Tarandeep Singh |
at Jun 4, 2009 at 6:17 pm
|
⇧ |
| |
Thanks Kevin for the clarification. I ran couple of tests as well and the
system behaved exactly what you had said.
So now the question is, how can I achieve what I want to do - share an
object (Lucene IndexWriter instance) between mappers running on same node. I
thought of running the IndexWriter separately outside of Hadoop and use
RMI/socket etc to communicate with it, but I am being optimistic that there
should be a simpler way than this. Any thoughts ?
Also, what if I modify the default behaviour of Hadoop to run mappers on a
node in one JVM ? (not sure if that will be possible in one first place,
just a thought)
-Tarandeep
On Thu, Jun 4, 2009 at 12:49 AM, Kevin Peterson wrote:On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh <
[email protected]wrote:
I want to share a object (Lucene Index Writer Instance) between mappers
running on same node of 1 job (not across multiple jobs). Please correct me
if I am wrong -
If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
mappers of one job will be executed in the same jvm and in that case if I
create a static Lucene Index Writer instance in my mapper class, all
mappers
running on the same node will be able to use it.
Not quite. The JVM reuse controls whether the JVM will be terminated after
a
single mapper run and a new one created for the next. It doesn't influence
how many JVMs are created -- you will still get one jvm per mapper or
reducer.
I think there is, or was, or maybe a patch enables, what you are asking
for,
IIRC.