FAQ
Hello,
I have successfully run Hadoop on a cluster of 3 nodes on RedHat linux, I
have several questions to ask.

1. When I submit an MR job using "hadoop jar mr-job.jar", it starts printing
out log messages to the stdout, how do I make it run in the
background?("hadoop jar mr-job.jar > log &" does not work). If it can be put
in the background, where do I find those log messages that it used to print
to the stdout?
2. While the MR job is being executed, will the MR job process be
affected/killed if I press "Ctrl-c"? it seems not since I can see the
tasktracker is still running, but I am not sure.
3. While the MR job is being executed, if I stop one of the
tasktrackers/nodes in the cluster using hadoop-daemon.sh, will the result of
the maps and reduces semi-completed by that tasktracker/node be submitted to
the namenode to be merged with the results completed by other tasktrackers?
Is it possible to restart a tasktracker at a point where it was stopped?

Thank you in advance.
- Kevin Tse

Search Discussions

  • Ted Yu at Jun 6, 2010 at 3:47 am
    1. If you use org.apache.log4j.Logger, you would be able to observe log
    messages through
    http://jobtracker:50060/tasklog?taskid=(attempt_201006052003_0003_m_000001_0&start=-8193)
    private final static Logger LOG =
    Logger.getLogger(AbstractRepMapper.class);

    2. After you submitted job to hadoop cluster, you need to use "hadoop job
    -kill <job_id>"
    See
    http://support.cc.gatech.edu/facilities/instructional-labs/how-to-submit-and-manage-hadoop-jobs
    On Sat, Jun 5, 2010 at 7:00 PM, Kevin Tse wrote:

    Hello,
    I have successfully run Hadoop on a cluster of 3 nodes on RedHat linux, I
    have several questions to ask.

    1. When I submit an MR job using "hadoop jar mr-job.jar", it starts
    printing
    out log messages to the stdout, how do I make it run in the
    background?("hadoop jar mr-job.jar > log &" does not work). If it can be
    put
    in the background, where do I find those log messages that it used to print
    to the stdout?
    2. While the MR job is being executed, will the MR job process be
    affected/killed if I press "Ctrl-c"? it seems not since I can see the
    tasktracker is still running, but I am not sure.
    3. While the MR job is being executed, if I stop one of the
    tasktrackers/nodes in the cluster using hadoop-daemon.sh, will the result
    of
    the maps and reduces semi-completed by that tasktracker/node be submitted
    to
    the namenode to be merged with the results completed by other tasktrackers?
    Is it possible to restart a tasktracker at a point where it was stopped?

    Thank you in advance.
    - Kevin Tse
  • Gang Luo at Jun 6, 2010 at 1:10 pm
    1. try to use JobClient.submitJob(JobConf). It will submit the job to hadoop without waiting for its completion.

    2. no

    3. the task running on the nodes which fail in the fly will be re-scheduled on other nodes. The incomplete result will not be used.

    Thanks,
    -Gang



    ----- 原始邮件 ----
    发件人: Kevin Tse <kevintse.onjee@gmail.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/6/5 (周六) 10:00:32 下午
    主 题: Several questions about Hadoop

    Hello,
    I have successfully run Hadoop on a cluster of 3 nodes on RedHat linux, I
    have several questions to ask.

    1. When I submit an MR job using "hadoop jar mr-job.jar", it starts printing
    out log messages to the stdout, how do I make it run in the
    background?("hadoop jar mr-job.jar > log &" does not work). If it can be put
    in the background, where do I find those log messages that it used to print
    to the stdout?
    2. While the MR job is being executed, will the MR job process be
    affected/killed if I press "Ctrl-c"? it seems not since I can see the
    tasktracker is still running, but I am not sure.
    3. While the MR job is being executed, if I stop one of the
    tasktrackers/nodes in the cluster using hadoop-daemon.sh, will the result of
    the maps and reduces semi-completed by that tasktracker/node be submitted to
    the namenode to be merged with the results completed by other tasktrackers?
    Is it possible to restart a tasktracker at a point where it was stopped?

    Thank you in advance.
    - Kevin Tse

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 6, '10 at 2:01a
activeJun 6, '10 at 1:10p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Gang Luo: 1 post Kevin Tse: 1 post Ted Yu: 1 post

People

Translate

site design / logo © 2022 Grokbase