FAQ
Hello,

I want to view the mapper output for a given hadoop streaming jobs (that
runs a shell script). However I am not able to find this in any log files.
Where should I look for this ?

Thanks,
Aishwarya

Search Discussions

  • Robert Evans at Oct 6, 2011 at 7:39 pm
    A streaming jobs stderr is logged for the task, but its stdout is what is sent to the reducer. The simplest way to get it is to turn off the reducers, and then look at the output in HDFS.

    --Bobby Evans

    On 10/6/11 1:16 PM, "Aishwarya Venkataraman" wrote:

    Hello,

    I want to view the mapper output for a given hadoop streaming jobs (that
    runs a shell script). However I am not able to find this in any log files.
    Where should I look for this ?

    Thanks,
    Aishwarya
  • Aishwarya Venkataraman at Oct 6, 2011 at 8:31 pm
    I ran the following (I am using IdentityReducer) :

    ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
    ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output

    When I do
    ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
    view the output of mapper ?

    Thanks,
    AIshwarya
    On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans wrote:

    A streaming jobs stderr is logged for the task, but its stdout is what is
    sent to the reducer. The simplest way to get it is to turn off the
    reducers, and then look at the output in HDFS.

    --Bobby Evans

    On 10/6/11 1:16 PM, "Aishwarya Venkataraman" wrote:

    Hello,

    I want to view the mapper output for a given hadoop streaming jobs (that
    runs a shell script). However I am not able to find this in any log files.
    Where should I look for this ?

    Thanks,
    Aishwarya

    --
    Thanks,
    Aishwarya Venkataraman
    avenkata@cs.ucsd.edu
    Graduate Student | Department of Computer Science
    University of California, San Diego
  • Robert Evans at Oct 6, 2011 at 8:42 pm
    Alshwarya,

    Are you running in local mode? If not you probably want to run

    hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output

    You may also want to run hadoop fs -ls output/* to see what files were produced. If your mappers failed for some reason then there will be no files in the output directory. And you may want to look at the stderr logs for your processes through the web UI.

    --Bobby Evans

    On 10/6/11 3:30 PM, "Aishwarya Venkataraman" wrote:

    I ran the following (I am using IdentityReducer) :

    ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
    ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output

    When I do
    ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
    view the output of mapper ?

    Thanks,
    AIshwarya
    On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans wrote:

    A streaming jobs stderr is logged for the task, but its stdout is what is
    sent to the reducer. The simplest way to get it is to turn off the
    reducers, and then look at the output in HDFS.

    --Bobby Evans

    On 10/6/11 1:16 PM, "Aishwarya Venkataraman" wrote:

    Hello,

    I want to view the mapper output for a given hadoop streaming jobs (that
    runs a shell script). However I am not able to find this in any log files.
    Where should I look for this ?

    Thanks,
    Aishwarya

    --
    Thanks,
    Aishwarya Venkataraman
    avenkata@cs.ucsd.edu
    Graduate Student | Department of Computer Science
    University of California, San Diego
  • Aishwarya Venkataraman at Oct 7, 2011 at 6:43 am
    Robert,

    My mapper job fails. I am basically trying to run a crawler on hadoop and
    hadoop kills the crawler (mapper) if it has not heard from it for a certain
    timeout period. But I already have a timeout set in my mapper(500 seconds)
    which is lesser than hadoop's timeout(900 seconds). The mapper just stalls
    for some reason. My mapper code is as follows:

    while read line;do
    result="`wget -O - --timeout=500 http://$line 2>&1`"
    echo $result
    done

    Any idea why my mapper is getting stalled ?

    I don't see the difference between the command you have given and the one I
    ran. I am not running in local mode. Is there some way by which I can get
    intermediate mapper outputs ? I would like to see for which site the mapper
    is getting stalled.

    Thanks,
    Aishwarya
    On Thu, Oct 6, 2011 at 1:41 PM, Robert Evans wrote:

    Alshwarya,

    Are you running in local mode? If not you probably want to run

    hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
    ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output

    You may also want to run hadoop fs -ls output/* to see what files were
    produced. If your mappers failed for some reason then there will be no
    files in the output directory. And you may want to look at the stderr logs
    for your processes through the web UI.

    --Bobby Evans

    On 10/6/11 3:30 PM, "Aishwarya Venkataraman" wrote:

    I ran the following (I am using IdentityReducer) :

    ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
    ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output

    When I do
    ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
    view the output of mapper ?

    Thanks,
    AIshwarya
    On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans wrote:

    A streaming jobs stderr is logged for the task, but its stdout is what is
    sent to the reducer. The simplest way to get it is to turn off the
    reducers, and then look at the output in HDFS.

    --Bobby Evans

    On 10/6/11 1:16 PM, "Aishwarya Venkataraman" wrote:

    Hello,

    I want to view the mapper output for a given hadoop streaming jobs (that
    runs a shell script). However I am not able to find this in any log files.
    Where should I look for this ?

    Thanks,
    Aishwarya

    --
    Thanks,
    Aishwarya Venkataraman
    avenkata@cs.ucsd.edu
    Graduate Student | Department of Computer Science
    University of California, San Diego

    --
    Thanks,
    Aishwarya Venkataraman
    avenkata@cs.ucsd.edu
    Graduate Student | Department of Computer Science
    University of California, San Diego
  • Robert Evans at Oct 7, 2011 at 3:02 pm
    The difference in the command is where the shell script is coming from. If you are using ~/mapper.sh then it will look in your home directory to run the script. If you have a small cluster with your home directory mounted on all of them then it is not that big of a deal. If you have a large cluster then the NFS mounting the directory on all of the boxes can cause a lot of issues. If you have a large cluster you should use the distributed cache to send it over (you are already sending it through the distributed cache by using the -file option).

    I am not completely sure why it would be timing out. Are all of them timing out, or is it just a single mapper that is timing out. One thing you can do it to run your streaming job, but with echo instead of mapper.sh, then you can use that as input to the command running on your local box.

    ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file ~/mapper.sh -mapper echo -input ../foo.txt -output output
    ./hadoop fs -cat output/part-00000 | ~/mapper.sh

    #or pick a different part file that corresponds to the mapper task that is timing out.

    --Bobby Evans

    On 10/7/11 1:43 AM, "Aishwarya Venkataraman" wrote:

    Robert,

    My mapper job fails. I am basically trying to run a crawler on hadoop and
    hadoop kills the crawler (mapper) if it has not heard from it for a certain
    timeout period. But I already have a timeout set in my mapper(500 seconds)
    which is lesser than hadoop's timeout(900 seconds). The mapper just stalls
    for some reason. My mapper code is as follows:

    while read line;do
    result="`wget -O - --timeout=500 http://$line 2>&1`"
    echo $result
    done

    Any idea why my mapper is getting stalled ?

    I don't see the difference between the command you have given and the one I
    ran. I am not running in local mode. Is there some way by which I can get
    intermediate mapper outputs ? I would like to see for which site the mapper
    is getting stalled.

    Thanks,
    Aishwarya
    On Thu, Oct 6, 2011 at 1:41 PM, Robert Evans wrote:

    Alshwarya,

    Are you running in local mode? If not you probably want to run

    hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
    ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output

    You may also want to run hadoop fs -ls output/* to see what files were
    produced. If your mappers failed for some reason then there will be no
    files in the output directory. And you may want to look at the stderr logs
    for your processes through the web UI.

    --Bobby Evans

    On 10/6/11 3:30 PM, "Aishwarya Venkataraman" wrote:

    I ran the following (I am using IdentityReducer) :

    ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
    ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output

    When I do
    ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
    view the output of mapper ?

    Thanks,
    AIshwarya
    On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans wrote:

    A streaming jobs stderr is logged for the task, but its stdout is what is
    sent to the reducer. The simplest way to get it is to turn off the
    reducers, and then look at the output in HDFS.

    --Bobby Evans

    On 10/6/11 1:16 PM, "Aishwarya Venkataraman" wrote:

    Hello,

    I want to view the mapper output for a given hadoop streaming jobs (that
    runs a shell script). However I am not able to find this in any log files.
    Where should I look for this ?

    Thanks,
    Aishwarya

    --
    Thanks,
    Aishwarya Venkataraman
    avenkata@cs.ucsd.edu
    Graduate Student | Department of Computer Science
    University of California, San Diego

    --
    Thanks,
    Aishwarya Venkataraman
    avenkata@cs.ucsd.edu
    Graduate Student | Department of Computer Science
    University of California, San Diego

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 6, '11 at 6:16p
activeOct 7, '11 at 3:02p
posts6
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase