FAQ
Hello,

I am developing one application with MapReduce and in that whenever some
MapTask condition is
met, I would like to broadcast to all other MapTask to abort their work. I
am not quite sure whether
such broadcasting functionality currently exist in Hadoop MapReduce. Could
someone give some
hints.

Although extending this functionality may be easy as all the slaves
periodically ping the master,
I was just thinking of piggybacking one bit information from the slave to
the master and master
may send this information to all the slaves in the next round. Any
suggestions to this approach ?

Thanks.

With Regards

-----
Chaman Singh Verma
Poona, India
--
View this message in context: http://www.nabble.com/Aborting-Map-Function-tp16722552p16722552.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

  • Sagar Naik at Apr 16, 2008 at 3:44 pm

    Chaman Singh Verma wrote:
    Hello,

    I am developing one application with MapReduce and in that whenever some
    MapTask condition is
    met, I would like to broadcast to all other MapTask to abort their work. I
    am not quite sure whether
    such broadcasting functionality currently exist in Hadoop MapReduce. Could
    someone give some
    hints.

    Although extending this functionality may be easy as all the slaves
    periodically ping the master,
    I was just thinking of piggybacking one bit information from the slave to
    the master and master
    may send this information to all the slaves in the next round. Any
    suggestions to this approach ?

    Thanks.

    With Regards

    -----
    Chaman Singh Verma
    Poona, India
    One possible solution could be to use Counters
    (http://hadoop.apache.org/core/docs/r0.16.2/api/org/apache/hadoop/mapred/Counters.html)
    Though it is advisable to look into details of implementation of it, and
    see if it can be used for multi-process shared variable
  • Owen O'Malley at Apr 16, 2008 at 5:27 pm

    On Apr 16, 2008, at 8:28 AM, Chaman Singh Verma wrote:

    I am developing one application with MapReduce and in that whenever
    some
    MapTask condition is
    met, I would like to broadcast to all other MapTask to abort their
    work. I
    am not quite sure whether
    such broadcasting functionality currently exist in Hadoop
    MapReduce. Could
    someone give some
    hints.
    This is pretty atypical behavior, but you could have each map look
    for the existence of an hdfs file every 1 minute or so. When the
    condition is true, create the file and your maps will exit in the
    next minute. Except on very large clusters, that wouldn't be too
    expensive...

    -- Owen
  • Milind Bhandarkar at Apr 16, 2008 at 6:04 pm
    If you want to kill the whole job (I assume that's what you mean by
    "aborting all map tasks") from a mapper, you can use:

    new JobClient(jobConf).getJob(job.get("mapred.job.id")).killJob();

    - milind

    On 4/16/08 10:25 AM, "Owen O'Malley" wrote:
    On Apr 16, 2008, at 8:28 AM, Chaman Singh Verma wrote:

    I am developing one application with MapReduce and in that whenever
    some
    MapTask condition is
    met, I would like to broadcast to all other MapTask to abort their
    work. I
    am not quite sure whether
    such broadcasting functionality currently exist in Hadoop
    MapReduce. Could
    someone give some
    hints.
    This is pretty atypical behavior, but you could have each map look
    for the existence of an hdfs file every 1 minute or so. When the
    condition is true, create the file and your maps will exit in the
    next minute. Except on very large clusters, that wouldn't be too
    expensive...

    -- Owen
    - Milind
    --
    Milind Bhandarkar, Chief Spammer, Grid Team
    Y!IM: GridSolutions
    408-349-2136
    (milindb@yahoo-inc.com)
  • Andrzej Bialecki at Apr 17, 2008 at 8:25 am

    Owen O'Malley wrote:
    On Apr 16, 2008, at 8:28 AM, Chaman Singh Verma wrote:

    I am developing one application with MapReduce and in that whenever some
    MapTask condition is
    met, I would like to broadcast to all other MapTask to abort their
    work. I
    am not quite sure whether
    such broadcasting functionality currently exist in Hadoop MapReduce.
    Could
    someone give some
    hints.
    This is pretty atypical behavior, but you could have each map look for
    the existence of an hdfs file every 1 minute or so. When the condition
    is true, create the file and your maps will exit in the next minute.
    Except on very large clusters, that wouldn't be too expensive...
    See also HADOOP-490. I use the message queue facility in my applications
    (HADOOP-368) but it works only for infrequent communication and smaller
    clusters.

    I still think that the job control protocol should allow sending
    "signals" to all tasks of a job. This would eliminate the need for
    polling, because applications could use a simple listener.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 16, '08 at 3:29p
activeApr 17, '08 at 8:25a
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase