Hi
I wanted to try out hadoop steaming and got the sample python code for
mapper and reducer. I copied both into my lfs and tried running the steaming
job as mention in the documentation.
Here the command i used to run the job

hadoop jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
-mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
/home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

Here other than input and output the rest all are on lfs locations. How ever
the job is failing. The error log from the jobtracker url is as

java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program
"/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission
denied
at java.lang.UNIXProcess.(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more

On the error I checked the permissions of mapper and reducer. Issued a chmod
777 command as well. Still no luck.

The permission of the files are as follows
cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
-rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
-rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
pseudo distributed mode. Any help would be highly appreciated.

Thank You

Regards
Bejoy.K.S

Search Discussions

  • Jeremy Lewi at Sep 12, 2011 at 1:22 pm
    I would suggest you try putting your mapper/reducer py files in a directory
    that is world readable at every level . i.e /tmp/test. I had similar
    problems when I was using streaming and I believe my workaround was to put
    the mapper/reducers outside my home directory. The other more involved
    alternative is to setup the linux task controller so you can run your MR
    jobs as the user who submits the jobs.

    J
    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS wrote:

    Hi
    I wanted to try out hadoop steaming and got the sample python code
    for mapper and reducer. I copied both into my lfs and tried running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations. How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13, Permission
    denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

    I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
    pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S
  • Bejoy KS at Sep 12, 2011 at 3:28 pm
    Thanks Jeremy. I tried with your first suggestion and the mappers ran into
    completion. But then the reducers failed with another exception related to
    pipes. I believe it may be due to permission issues again. I tried setting a
    few additional config parameters but it didn't do the job. Please find the
    command used and the error logs from jobtracker web UI

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
    dfs.data.dir=/home/streaming/tmp -D
    mapred.local.dir=/home/streaming/tmp/local -D
    mapred.system.dir=/home/streaming/tmp/system -D
    mapred.temp.dir=/home/streaming/tmp/temp -input
    /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/streaming/WcStreamMap.py -reducer
    /home/streaming/WcStreamReduce.py


    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    The folder permissions at the time of job execution are as follows

    cloudera@cloudera-vm:~$ ls -l /home/streaming/
    drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
    -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

    cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
    drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
    drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

    Am I missing some thing here?

    It is not for long I'm into Linux so couldn't try your second suggestion on
    setting up the Linux task controller.

    Thanks a lot

    Regards
    Bejoy.K.S


    On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi wrote:

    I would suggest you try putting your mapper/reducer py files in a directory
    that is world readable at every level . i.e /tmp/test. I had similar
    problems when I was using streaming and I believe my workaround was to put
    the mapper/reducers outside my home directory. The other more involved
    alternative is to setup the linux task controller so you can run your MR
    jobs as the user who submits the jobs.

    J

    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS wrote:

    Hi
    I wanted to try out hadoop steaming and got the sample python code
    for mapper and reducer. I copied both into my lfs and tried running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations. How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at
    org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13, Permission
    denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

    I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
    pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S
  • Jeremy Lewi at Sep 13, 2011 at 3:37 am
    Bejoy,

    The other problem I typically ran into using python streaming jobs was if my
    mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
    back to Hadoop, any erroneous "print" statements will cause the pipe to
    break. The easiest way around this is to redirect "stdout" to "stderr" at
    the entry point to your mapper and reducer; do this even before you import
    any modules so that even if those modules call "print" it gets redirected.

    Note: if your using dumbo (but I don't think you are) the above solution may
    not work but I can send you a pointer.

    J
    On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS wrote:

    Thanks Jeremy. I tried with your first suggestion and the mappers ran into
    completion. But then the reducers failed with another exception related to
    pipes. I believe it may be due to permission issues again. I tried setting a
    few additional config parameters but it didn't do the job. Please find the
    command used and the error logs from jobtracker web UI

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
    dfs.data.dir=/home/streaming/tmp -D
    mapred.local.dir=/home/streaming/tmp/local -D
    mapred.system.dir=/home/streaming/tmp/system -D
    mapred.temp.dir=/home/streaming/tmp/temp -input
    /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/streaming/WcStreamMap.py -reducer
    /home/streaming/WcStreamReduce.py


    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    The folder permissions at the time of job execution are as follows

    cloudera@cloudera-vm:~$ ls -l /home/streaming/
    drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
    -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

    cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
    drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
    drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

    Am I missing some thing here?

    It is not for long I'm into Linux so couldn't try your second suggestion on
    setting up the Linux task controller.

    Thanks a lot

    Regards
    Bejoy.K.S



    On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi wrote:

    I would suggest you try putting your mapper/reducer py files in a
    directory that is world readable at every level . i.e /tmp/test. I had
    similar problems when I was using streaming and I believe my workaround was
    to put the mapper/reducers outside my home directory. The other more
    involved alternative is to setup the linux task controller so you can run
    your MR jobs as the user who submits the jobs.

    J

    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS wrote:

    Hi
    I wanted to try out hadoop steaming and got the sample python code
    for mapper and reducer. I copied both into my lfs and tried running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations. How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at
    org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13, Permission
    denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

    I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on
    pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S
  • Bejoy KS at Sep 13, 2011 at 7:42 am
    Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
    entry point to your mapper and reducer'.
    Basically I'm a java hadoop developer and has no idea on python programming.
    Could you please help me with mode details like the line of code i need to
    include to achieve this.

    Also I tried a still more deep drill down on my error logs and found the
    following line as well

    *stderr logs*

    /usr/bin/env: python
    : No such file or directory
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    log4j:WARN No appenders could be found for logger
    (org.apache.hadoop.hdfs.DFSClient).
    log4j:WARN Please initialize the log4j system properly.

    I verified on the existence of such a directory and it was present
    '/usr/bin/env' .

    Could you please provide little more guidance on the same.


    On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi wrote:

    Bejoy,

    The other problem I typically ran into using python streaming jobs was if
    my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
    back to Hadoop, any erroneous "print" statements will cause the pipe to
    break. The easiest way around this is to redirect "stdout" to "stderr" at
    the entry point to your mapper and reducer; do this even before you import
    any modules so that even if those modules call "print" it gets redirected.

    Note: if your using dumbo (but I don't think you are) the above solution
    may not work but I can send you a pointer.

    J

    On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS wrote:

    Thanks Jeremy. I tried with your first suggestion and the mappers ran into
    completion. But then the reducers failed with another exception related to
    pipes. I believe it may be due to permission issues again. I tried setting a
    few additional config parameters but it didn't do the job. Please find the
    command used and the error logs from jobtracker web UI

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
    dfs.data.dir=/home/streaming/tmp -D
    mapred.local.dir=/home/streaming/tmp/local -D
    mapred.system.dir=/home/streaming/tmp/system -D
    mapred.temp.dir=/home/streaming/tmp/temp -input
    /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/streaming/WcStreamMap.py -reducer
    /home/streaming/WcStreamReduce.py


    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    The folder permissions at the time of job execution are as follows

    cloudera@cloudera-vm:~$ ls -l /home/streaming/
    drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
    -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

    cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
    drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
    drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

    Am I missing some thing here?

    It is not for long I'm into Linux so couldn't try your second suggestion
    on setting up the Linux task controller.

    Thanks a lot

    Regards
    Bejoy.K.S



    On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi wrote:

    I would suggest you try putting your mapper/reducer py files in a
    directory that is world readable at every level . i.e /tmp/test. I had
    similar problems when I was using streaming and I believe my workaround was
    to put the mapper/reducers outside my home directory. The other more
    involved alternative is to setup the linux task controller so you can run
    your MR jobs as the user who submits the jobs.

    J

    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS wrote:

    Hi
    I wanted to try out hadoop steaming and got the sample python code
    for mapper and reducer. I copied both into my lfs and tried running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations. How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at
    org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13,
    Permission denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

    I'm testing the same on Cloudera Demo VM. So the hadoop setup would be
    on pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S
  • Harsh J at Sep 13, 2011 at 8:06 am
    The env binary would be present, but do all your TT nodes have python
    properly installed on them? The env program can't find them and that's
    probably why your scripts with shbang don't run.
    On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS wrote:
    Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
    entry point to your mapper and reducer'.
    Basically I'm a java hadoop developer and has no idea on python programming.
    Could you please help me with mode details like the line of code i need to
    include to achieve this.

    Also I tried a still more deep drill down on my error logs and found the
    following line as well

    stderr logs

    /usr/bin/env: python
    : No such file or directory
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    log4j:WARN No appenders could be found for logger
    (org.apache.hadoop.hdfs.DFSClient).
    log4j:WARN Please initialize the log4j system properly.

    I verified on the existence of such a directory and it was present
    '/usr/bin/env' .

    Could you please provide little more guidance on the same.


    On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi wrote:

    Bejoy,
    The other problem I typically ran into using python streaming jobs was if
    my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
    back to Hadoop, any erroneous "print" statements will cause the pipe to
    break. The easiest way around this is to redirect "stdout" to "stderr" at
    the entry point to your mapper and reducer; do this even before you import
    any modules so that even if those modules call "print" it gets redirected.
    Note: if your using dumbo (but I don't think you are) the above solution
    may not work but I can send you a pointer.
    J
    On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS wrote:

    Thanks Jeremy. I tried with your first suggestion and the mappers ran
    into completion. But then the reducers failed with another exception related
    to pipes. I believe it may be due to permission issues again. I tried
    setting a few additional config parameters but it didn't do the job. Please
    find the command used and the error logs from jobtracker web UI

    hadoop  jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
    dfs.data.dir=/home/streaming/tmp -D
    mapred.local.dir=/home/streaming/tmp/local -D
    mapred.system.dir=/home/streaming/tmp/system -D
    mapred.temp.dir=/home/streaming/tmp/temp -input
    /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/streaming/WcStreamMap.py  -reducer
    /home/streaming/WcStreamReduce.py


    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at
    org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    The folder permissions at the time of job execution are as follows

    cloudera@cloudera-vm:~$ ls -l  /home/streaming/
    drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
    -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

    cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
    drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
    drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

    Am I missing some thing here?

    It is not for long I'm into Linux so couldn't try your second suggestion
    on setting up the Linux task controller.

    Thanks a lot

    Regards
    Bejoy.K.S


    On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi wrote:

    I would suggest you try putting your mapper/reducer py files in a
    directory that is world readable at every level . i.e /tmp/test. I had
    similar problems when I was using streaming and I believe my workaround was
    to put the mapper/reducers outside my home directory. The other more
    involved alternative is to setup the linux task controller so you can run
    your MR jobs as the user who submits the jobs.
    J

    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.hadoop@gmail.com>
    wrote:
    Hi
    I wanted to try out hadoop steaming and got the sample python
    code for mapper and reducer. I copied both into my lfs and tried running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop  jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py  -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations. How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at
    org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13,
    Permission denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera  707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py

    I'm testing the same on Cloudera Demo VM. So the hadoop setup would be
    on pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S


    --
    Harsh J
  • Bejoy KS at Sep 13, 2011 at 8:43 am
    Hi Harsh
    Thank You for the response. I'm on Cloudera demo VM. It is on
    hadoop 0.20 and has python installed. Do I have to do any further
    installation/configuration to get python running?
    On Tue, Sep 13, 2011 at 1:36 PM, Harsh J wrote:

    The env binary would be present, but do all your TT nodes have python
    properly installed on them? The env program can't find them and that's
    probably why your scripts with shbang don't run.
    On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS wrote:
    Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
    entry point to your mapper and reducer'.
    Basically I'm a java hadoop developer and has no idea on python
    programming.
    Could you please help me with mode details like the line of code i need to
    include to achieve this.

    Also I tried a still more deep drill down on my error logs and found the
    following line as well

    stderr logs

    /usr/bin/env: python
    : No such file or directory
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at
    org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    log4j:WARN No appenders could be found for logger
    (org.apache.hadoop.hdfs.DFSClient).
    log4j:WARN Please initialize the log4j system properly.

    I verified on the existence of such a directory and it was present
    '/usr/bin/env' .

    Could you please provide little more guidance on the same.


    On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi wrote:

    Bejoy,
    The other problem I typically ran into using python streaming jobs was
    if
    my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
    data
    back to Hadoop, any erroneous "print" statements will cause the pipe to
    break. The easiest way around this is to redirect "stdout" to "stderr"
    at
    the entry point to your mapper and reducer; do this even before you
    import
    any modules so that even if those modules call "print" it gets
    redirected.
    Note: if your using dumbo (but I don't think you are) the above solution
    may not work but I can send you a pointer.
    J
    On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS wrote:

    Thanks Jeremy. I tried with your first suggestion and the mappers ran
    into completion. But then the reducers failed with another exception
    related
    to pipes. I believe it may be due to permission issues again. I tried
    setting a few additional config parameters but it didn't do the job.
    Please
    find the command used and the error logs from jobtracker web UI

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
    dfs.data.dir=/home/streaming/tmp -D
    mapred.local.dir=/home/streaming/tmp/local -D
    mapred.system.dir=/home/streaming/tmp/system -D
    mapred.temp.dir=/home/streaming/tmp/temp -input
    /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/streaming/WcStreamMap.py -reducer
    /home/streaming/WcStreamReduce.py


    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at
    org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    The folder permissions at the time of job execution are as follows

    cloudera@cloudera-vm:~$ ls -l /home/streaming/
    drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
    -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

    cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
    drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
    drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

    Am I missing some thing here?

    It is not for long I'm into Linux so couldn't try your second
    suggestion
    on setting up the Linux task controller.

    Thanks a lot

    Regards
    Bejoy.K.S


    On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi wrote:

    I would suggest you try putting your mapper/reducer py files in a
    directory that is world readable at every level . i.e /tmp/test. I had
    similar problems when I was using streaming and I believe my
    workaround was
    to put the mapper/reducers outside my home directory. The other more
    involved alternative is to setup the linux task controller so you can
    run
    your MR jobs as the user who submits the jobs.
    J

    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.hadoop@gmail.com>
    wrote:
    Hi
    I wanted to try out hadoop steaming and got the sample python
    code for mapper and reducer. I copied both into my lfs and tried
    running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output
    /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations.
    How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at
    org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py":
    java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13,
    Permission denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued
    a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42
    WcStreamReduce.py
    I'm testing the same on Cloudera Demo VM. So the hadoop setup would
    be
    on pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S


    --
    Harsh J
  • Jeremy Lewi at Sep 13, 2011 at 8:12 pm
    Benjoy to redirect stdout add the lines

    import sys
    sys.stdout=sys.stderr

    to the top of your py files (i.e right after the shebang line).

    J
    On Tue, Sep 13, 2011 at 1:42 AM, Bejoy KS wrote:

    Hi Harsh
    Thank You for the response. I'm on Cloudera demo VM. It is on
    hadoop 0.20 and has python installed. Do I have to do any further
    installation/configuration to get python running?

    On Tue, Sep 13, 2011 at 1:36 PM, Harsh J wrote:

    The env binary would be present, but do all your TT nodes have python
    properly installed on them? The env program can't find them and that's
    probably why your scripts with shbang don't run.
    On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS wrote:
    Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
    entry point to your mapper and reducer'.
    Basically I'm a java hadoop developer and has no idea on python
    programming.
    Could you please help me with mode details like the line of code i need to
    include to achieve this.

    Also I tried a still more deep drill down on my error logs and found the
    following line as well

    stderr logs

    /usr/bin/env: python
    : No such file or directory
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at
    org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    log4j:WARN No appenders could be found for logger
    (org.apache.hadoop.hdfs.DFSClient).
    log4j:WARN Please initialize the log4j system properly.

    I verified on the existence of such a directory and it was present
    '/usr/bin/env' .

    Could you please provide little more guidance on the same.


    On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi wrote:

    Bejoy,
    The other problem I typically ran into using python streaming jobs was
    if
    my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
    data
    back to Hadoop, any erroneous "print" statements will cause the pipe to
    break. The easiest way around this is to redirect "stdout" to "stderr"
    at
    the entry point to your mapper and reducer; do this even before you
    import
    any modules so that even if those modules call "print" it gets
    redirected.
    Note: if your using dumbo (but I don't think you are) the above
    solution
    may not work but I can send you a pointer.
    J

    On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <bejoy.hadoop@gmail.com>
    wrote:
    Thanks Jeremy. I tried with your first suggestion and the mappers ran
    into completion. But then the reducers failed with another exception
    related
    to pipes. I believe it may be due to permission issues again. I tried
    setting a few additional config parameters but it didn't do the job.
    Please
    find the command used and the error logs from jobtracker web UI

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
    dfs.data.dir=/home/streaming/tmp -D
    mapred.local.dir=/home/streaming/tmp/local -D
    mapred.system.dir=/home/streaming/tmp/system -D
    mapred.temp.dir=/home/streaming/tmp/temp -input
    /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
    -mapper /home/streaming/WcStreamMap.py -reducer
    /home/streaming/WcStreamReduce.py


    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 127
    at
    org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at
    org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at
    org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)


    The folder permissions at the time of job execution are as follows

    cloudera@cloudera-vm:~$ ls -l /home/streaming/
    drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
    -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py

    cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
    drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
    drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
    drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp

    Am I missing some thing here?

    It is not for long I'm into Linux so couldn't try your second
    suggestion
    on setting up the Linux task controller.

    Thanks a lot

    Regards
    Bejoy.K.S


    On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi wrote:

    I would suggest you try putting your mapper/reducer py files in a
    directory that is world readable at every level . i.e /tmp/test. I
    had
    similar problems when I was using streaming and I believe my
    workaround was
    to put the mapper/reducers outside my home directory. The other more
    involved alternative is to setup the linux task controller so you can
    run
    your MR jobs as the user who submits the jobs.
    J

    On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <bejoy.hadoop@gmail.com>
    wrote:
    Hi
    I wanted to try out hadoop steaming and got the sample python
    code for mapper and reducer. I copied both into my lfs and tried
    running the
    steaming job as mention in the documentation.
    Here the command i used to run the job

    hadoop jar
    /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
    -input /userdata/bejoy/apps/wc/input -output
    /userdata/bejoy/apps/wc/output
    -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer
    /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py

    Here other than input and output the rest all are on lfs locations.
    How
    ever the job is failing. The error log from the jobtracker url is as

    java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
    Caused by: java.lang.RuntimeException: Error in configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
    Caused by: java.lang.RuntimeException: configuration exception
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at
    org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
    Caused by: java.io.IOException: Cannot run program
    "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py":
    java.io.IOException:
    error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at
    org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
    Caused by: java.io.IOException: java.io.IOException: error=13,
    Permission denied
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

    On the error I checked the permissions of mapper and reducer. Issued
    a
    chmod 777 command as well. Still no luck.

    The permission of the files are as follows
    cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/
    -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py
    -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42
    WcStreamReduce.py
    I'm testing the same on Cloudera Demo VM. So the hadoop setup would
    be
    on pseudo distributed mode. Any help would be highly appreciated.

    Thank You

    Regards
    Bejoy.K.S


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedSep 12, '11 at 9:19a
activeSep 13, '11 at 8:12p
posts8
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Bejoy KS: 4 posts Jeremy Lewi: 3 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase