FAQ
Hi deal all,

I am trying to run python program in hadoop streaming. when I am trying to
run the program getting the following error:

I have only mapper and no reducer. I have also tried -D mapred.reduce.tasks
= 0

My program work when try $ cat Test3.csv|/home/mapper2.py. But not working
on hadoop environment. Please let me know, if have any solution.

Command:

root@hadoop1:~# hadoop jar
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming*.jar
-file /home/mapper2.py -mapper /home/mapper2.py -input
/user/root/Test3.csv -output output-30


Error Message:


packageJobJar: [/home/mapper2.py,
/tmp/hadoop-root/hadoop-unjar1392627538946266816/] []
/tmp/streamjob8930974697799567634.jar tmpDir=null
13/02/17 14:15:33 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
13/02/17 14:15:33 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/02/17 14:15:33 INFO streaming.StreamJob: getLocalDirs():
[/tmp/hadoop-root/mapred/local]
13/02/17 14:15:33 INFO streaming.StreamJob: Running job:
job_201302161828_0005
13/02/17 14:15:33 INFO streaming.StreamJob: To kill this job, run:
13/02/17 14:15:33 INFO streaming.StreamJob: UNDEF/bin/hadoop job
-Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
13/02/17 14:15:33 INFO streaming.StreamJob: Tracking URL:
http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
13/02/17 14:15:34 INFO streaming.StreamJob: map 0% reduce 0%
13/02/17 14:15:52 INFO streaming.StreamJob: map 50% reduce 0%
13/02/17 14:15:55 INFO streaming.StreamJob: map 0% reduce 0%
13/02/17 14:16:17 INFO streaming.StreamJob: map 100% reduce 100%
13/02/17 14:16:17 INFO streaming.StreamJob: To kill this job, run:
13/02/17 14:16:17 INFO streaming.StreamJob: UNDEF/bin/hadoop job
-Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
13/02/17 14:16:17 INFO streaming.StreamJob: Tracking URL:
http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
13/02/17 14:16:17 ERROR streaming.StreamJob: Job not successful. Error: NA
13/02/17 14:16:17 INFO streaming.StreamJob: killJob...
Streaming Command Failed!


Log:

2013-02-17 14:19:30,383 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201302161828_0006_m_000001_3: java.lang.RuntimeException:
PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)


Many thanks

Best regards

--

Search Discussions

  • Harsh J at Feb 17, 2013 at 2:42 pm
    A few questions:

    1. Is your mapper.py marked executable? (i.e. has a +x bit set)?
    2. Since you pass -mapper <py file>, does your py file carry a shbang
    line such as "#!/usr/bin/env python" to indicate its to be run via a
    python shell command?
    3. Is Python installed on all the nodes in your cluster?
    On Sun, Feb 17, 2013 at 8:00 PM, Mukhtaj Khan wrote:
    Hi deal all,

    I am trying to run python program in hadoop streaming. when I am trying to
    run the program getting the following error:

    I have only mapper and no reducer. I have also tried -D mapred.reduce.tasks
    = 0

    My program work when try $ cat Test3.csv|/home/mapper2.py. But not working
    on hadoop environment. Please let me know, if have any solution.

    Command:

    root@hadoop1:~# hadoop jar
    /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming*.jar -file
    /home/mapper2.py -mapper /home/mapper2.py -input /user/root/Test3.csv
    -output output-30


    Error Message:


    packageJobJar: [/home/mapper2.py,
    /tmp/hadoop-root/hadoop-unjar1392627538946266816/] []
    /tmp/streamjob8930974697799567634.jar tmpDir=null
    13/02/17 14:15:33 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    13/02/17 14:15:33 INFO mapred.FileInputFormat: Total input paths to process
    : 1
    13/02/17 14:15:33 INFO streaming.StreamJob: getLocalDirs():
    [/tmp/hadoop-root/mapred/local]
    13/02/17 14:15:33 INFO streaming.StreamJob: Running job:
    job_201302161828_0005
    13/02/17 14:15:33 INFO streaming.StreamJob: To kill this job, run:
    13/02/17 14:15:33 INFO streaming.StreamJob: UNDEF/bin/hadoop job
    -Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
    13/02/17 14:15:33 INFO streaming.StreamJob: Tracking URL:
    http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
    13/02/17 14:15:34 INFO streaming.StreamJob: map 0% reduce 0%
    13/02/17 14:15:52 INFO streaming.StreamJob: map 50% reduce 0%
    13/02/17 14:15:55 INFO streaming.StreamJob: map 0% reduce 0%
    13/02/17 14:16:17 INFO streaming.StreamJob: map 100% reduce 100%
    13/02/17 14:16:17 INFO streaming.StreamJob: To kill this job, run:
    13/02/17 14:16:17 INFO streaming.StreamJob: UNDEF/bin/hadoop job
    -Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
    13/02/17 14:16:17 INFO streaming.StreamJob: Tracking URL:
    http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
    13/02/17 14:16:17 ERROR streaming.StreamJob: Job not successful. Error: NA
    13/02/17 14:16:17 INFO streaming.StreamJob: killJob...
    Streaming Command Failed!


    Log:

    2013-02-17 14:19:30,383 INFO org.apache.hadoop.mapred.TaskInProgress: Error
    from attempt_201302161828_0006_m_000001_3: java.lang.RuntimeException:
    PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    Many thanks

    Best regards

    --



    --
    Harsh J

    --
  • Mukhtaj Khan at Feb 17, 2013 at 3:54 pm
    Thanks harsh for reply.

    I have done all your mentioned steps. My program is executed without any
    error when input file is small (500 bytes) but when trying to increase the
    size of the file (5.1 kbs) then give me the error message. I have
    configured the hadoop on oracle virtual Box.

    Please let me know, if have any other solution.

    Many thanks




    On Sun, Feb 17, 2013 at 2:42 PM, Harsh J wrote:

    A few questions:

    1. Is your mapper.py marked executable? (i.e. has a +x bit set)?
    2. Since you pass -mapper <py file>, does your py file carry a shbang
    line such as "#!/usr/bin/env python" to indicate its to be run via a
    python shell command?
    3. Is Python installed on all the nodes in your cluster?
    On Sun, Feb 17, 2013 at 8:00 PM, Mukhtaj Khan wrote:
    Hi deal all,

    I am trying to run python program in hadoop streaming. when I am trying to
    run the program getting the following error:

    I have only mapper and no reducer. I have also tried -D
    mapred.reduce.tasks
    = 0

    My program work when try $ cat Test3.csv|/home/mapper2.py. But not working
    on hadoop environment. Please let me know, if have any solution.

    Command:

    root@hadoop1:~# hadoop jar
    /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming*.jar -file
    /home/mapper2.py -mapper /home/mapper2.py -input /user/root/Test3.csv
    -output output-30


    Error Message:


    packageJobJar: [/home/mapper2.py,
    /tmp/hadoop-root/hadoop-unjar1392627538946266816/] []
    /tmp/streamjob8930974697799567634.jar tmpDir=null
    13/02/17 14:15:33 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    13/02/17 14:15:33 INFO mapred.FileInputFormat: Total input paths to process
    : 1
    13/02/17 14:15:33 INFO streaming.StreamJob: getLocalDirs():
    [/tmp/hadoop-root/mapred/local]
    13/02/17 14:15:33 INFO streaming.StreamJob: Running job:
    job_201302161828_0005
    13/02/17 14:15:33 INFO streaming.StreamJob: To kill this job, run:
    13/02/17 14:15:33 INFO streaming.StreamJob: UNDEF/bin/hadoop job
    -Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
    13/02/17 14:15:33 INFO streaming.StreamJob: Tracking URL:
    http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
    13/02/17 14:15:34 INFO streaming.StreamJob: map 0% reduce 0%
    13/02/17 14:15:52 INFO streaming.StreamJob: map 50% reduce 0%
    13/02/17 14:15:55 INFO streaming.StreamJob: map 0% reduce 0%
    13/02/17 14:16:17 INFO streaming.StreamJob: map 100% reduce 100%
    13/02/17 14:16:17 INFO streaming.StreamJob: To kill this job, run:
    13/02/17 14:16:17 INFO streaming.StreamJob: UNDEF/bin/hadoop job
    -Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
    13/02/17 14:16:17 INFO streaming.StreamJob: Tracking URL:
    http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
    13/02/17 14:16:17 ERROR streaming.StreamJob: Job not successful. Error: NA
    13/02/17 14:16:17 INFO streaming.StreamJob: killJob...
    Streaming Command Failed!


    Log:

    2013-02-17 14:19:30,383 INFO org.apache.hadoop.mapred.TaskInProgress: Error
    from attempt_201302161828_0006_m_000001_3: java.lang.RuntimeException:
    PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at
    org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    Many thanks

    Best regards

    --



    --
    Harsh J

    --


    --
  • Mukhtaj Khan at Feb 17, 2013 at 4:42 pm
    Hi Harsh,

    Once again thanks, I have solved the problem. Basically, was the problem in
    data format. Actually, I am processing the csv file and the data points was
    not uniform. Make the data points uniform (i.e. all data points have equal
    decimal points) and now work fine.

    I have another issue relevant to output file. I am expected 6000 data
    points vector. But the output files just show few data points and put .....
    as:

    [[ 0.00029443]
    [ 0.00031433]
    [ 0.00032848]
    ...,
    [ 0. ]
    [ 0. ]
    [ 0. ]]


    How, I can show all the data points because i needed.

    Another things, how i can restrict the application, to produce only one
    output file (part_00000).

    Many thanks again.

    Best regards,



    On Sun, Feb 17, 2013 at 3:54 PM, Mukhtaj Khan wrote:

    Thanks harsh for reply.

    I have done all your mentioned steps. My program is executed without any
    error when input file is small (500 bytes) but when trying to increase the
    size of the file (5.1 kbs) then give me the error message. I have
    configured the hadoop on oracle virtual Box.

    Please let me know, if have any other solution.

    Many thanks





    On Sun, Feb 17, 2013 at 2:42 PM, Harsh J wrote:

    A few questions:

    1. Is your mapper.py marked executable? (i.e. has a +x bit set)?
    2. Since you pass -mapper <py file>, does your py file carry a shbang
    line such as "#!/usr/bin/env python" to indicate its to be run via a
    python shell command?
    3. Is Python installed on all the nodes in your cluster?

    On Sun, Feb 17, 2013 at 8:00 PM, Mukhtaj Khan <drmukhtaj@gmail.com>
    wrote:
    Hi deal all,

    I am trying to run python program in hadoop streaming. when I am trying to
    run the program getting the following error:

    I have only mapper and no reducer. I have also tried -D
    mapred.reduce.tasks
    = 0

    My program work when try $ cat Test3.csv|/home/mapper2.py. But not working
    on hadoop environment. Please let me know, if have any solution.

    Command:

    root@hadoop1:~# hadoop jar
    /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming*.jar -file
    /home/mapper2.py -mapper /home/mapper2.py -input /user/root/Test3.csv
    -output output-30


    Error Message:


    packageJobJar: [/home/mapper2.py,
    /tmp/hadoop-root/hadoop-unjar1392627538946266816/] []
    /tmp/streamjob8930974697799567634.jar tmpDir=null
    13/02/17 14:15:33 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    13/02/17 14:15:33 INFO mapred.FileInputFormat: Total input paths to process
    : 1
    13/02/17 14:15:33 INFO streaming.StreamJob: getLocalDirs():
    [/tmp/hadoop-root/mapred/local]
    13/02/17 14:15:33 INFO streaming.StreamJob: Running job:
    job_201302161828_0005
    13/02/17 14:15:33 INFO streaming.StreamJob: To kill this job, run:
    13/02/17 14:15:33 INFO streaming.StreamJob: UNDEF/bin/hadoop job
    -Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
    13/02/17 14:15:33 INFO streaming.StreamJob: Tracking URL:
    http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
    13/02/17 14:15:34 INFO streaming.StreamJob: map 0% reduce 0%
    13/02/17 14:15:52 INFO streaming.StreamJob: map 50% reduce 0%
    13/02/17 14:15:55 INFO streaming.StreamJob: map 0% reduce 0%
    13/02/17 14:16:17 INFO streaming.StreamJob: map 100% reduce 100%
    13/02/17 14:16:17 INFO streaming.StreamJob: To kill this job, run:
    13/02/17 14:16:17 INFO streaming.StreamJob: UNDEF/bin/hadoop job
    -Dmapred.job.tracker=xxx.xx.xx.xxx:8021 -kill job_201302161828_0005
    13/02/17 14:16:17 INFO streaming.StreamJob: Tracking URL:
    http://xxx.xx.xx.xxx:50030/jobdetails.jsp?jobid=job_201302161828_0005
    13/02/17 14:16:17 ERROR streaming.StreamJob: Job not successful. Error: NA
    13/02/17 14:16:17 INFO streaming.StreamJob: killJob...
    Streaming Command Failed!


    Log:

    2013-02-17 14:19:30,383 INFO org.apache.hadoop.mapred.TaskInProgress: Error
    from attempt_201302161828_0006_m_000001_3: java.lang.RuntimeException:
    PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at
    org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    Many thanks

    Best regards

    --



    --
    Harsh J

    --


    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedFeb 17, '13 at 2:31p
activeFeb 17, '13 at 4:42p
posts4
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Mukhtaj Khan: 3 posts Harsh J: 1 post

People

Translate

site design / logo © 2018 Grokbase