FAQ
Hi,

I want to share a object (Lucene Index Writer Instance) between mappers
running on same node of 1 job (not across multiple jobs). Please correct me
if I am wrong -

If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
mappers of one job will be executed in the same jvm and in that case if I
create a static Lucene Index Writer instance in my mapper class, all mappers
running on the same node will be able to use it.

Thanks,
Tarandeep

Search Discussions

  • Kevin Peterson at Jun 4, 2009 at 7:49 am

    On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh wrote:

    I want to share a object (Lucene Index Writer Instance) between mappers
    running on same node of 1 job (not across multiple jobs). Please correct me
    if I am wrong -

    If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
    mappers of one job will be executed in the same jvm and in that case if I
    create a static Lucene Index Writer instance in my mapper class, all
    mappers
    running on the same node will be able to use it.
    Not quite. The JVM reuse controls whether the JVM will be terminated after a
    single mapper run and a new one created for the next. It doesn't influence
    how many JVMs are created -- you will still get one jvm per mapper or
    reducer.

    I think there is, or was, or maybe a patch enables, what you are asking for,
    IIRC.
  • Tarandeep Singh at Jun 4, 2009 at 6:17 pm
    Thanks Kevin for the clarification. I ran couple of tests as well and the
    system behaved exactly what you had said.

    So now the question is, how can I achieve what I want to do - share an
    object (Lucene IndexWriter instance) between mappers running on same node. I
    thought of running the IndexWriter separately outside of Hadoop and use
    RMI/socket etc to communicate with it, but I am being optimistic that there
    should be a simpler way than this. Any thoughts ?

    Also, what if I modify the default behaviour of Hadoop to run mappers on a
    node in one JVM ? (not sure if that will be possible in one first place,
    just a thought)

    -Tarandeep
    On Thu, Jun 4, 2009 at 12:49 AM, Kevin Peterson wrote:

    On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh <[email protected]
    wrote:
    I want to share a object (Lucene Index Writer Instance) between mappers
    running on same node of 1 job (not across multiple jobs). Please correct me
    if I am wrong -

    If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
    mappers of one job will be executed in the same jvm and in that case if I
    create a static Lucene Index Writer instance in my mapper class, all
    mappers
    running on the same node will be able to use it.
    Not quite. The JVM reuse controls whether the JVM will be terminated after
    a
    single mapper run and a new one created for the next. It doesn't influence
    how many JVMs are created -- you will still get one jvm per mapper or
    reducer.

    I think there is, or was, or maybe a patch enables, what you are asking
    for,
    IIRC.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 3, '09 at 5:59p
activeJun 4, '09 at 6:17p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Tarandeep Singh: 2 posts Kevin Peterson: 1 post

People

Translate

site design / logo © 2023 Grokbase