FAQ
HI Group,

I want to know that I can I execute an R MapReduce code (say WordCount.R)
from Oozie. I tried creating a simple workflow from Oozie Editor/Dashboard
-> Workflows. There I created a ShellAction and gave the Path to Execute
Shell Command (Path of my WordCount.R in HDFS) and in Add Files (Same HDFS
path of WordCount.R).

After saving, when I submit the workflow, it gives me the following error:

*>>> Invoking Shell command line now >>
Stdoutput log4j:ERROR Could not find value
for key log4j.appender.TLA
Stdoutput log4j:ERROR Could not instantiate
appender named "TLA".
Stdoutput packageJobJar:
[/tmp/RtmpnscKd7/rmr-local-env6d124390c901,
/tmp/RtmpnscKd7/rmr-global-env6d126e1cf2f8,
/tmp/RtmpnscKd7/rmr-streaming-map6d12396c8c9e,
/tmp/RtmpnscKd7/rmr-streaming-reduce6d127bccd0ee,
/tmp/RtmpnscKd7/rmr-streaming-combine6d127e57be8f,
/tmp/hadoop-mapred/hadoop-unjar8376146431259627088/] []
/tmp/streamjob3260548049415293781.jar tmpDir=null
Stdoutput log4j:WARN No appenders could be
found for logger (org.apache.hadoop.security.UserGroupInformation).
Stdoutput log4j:WARN Please initialize the
log4j system properly.
Stdoutput log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Stdoutput Streaming Command Failed!
Exit code of the Shell command 1
<<< Invocation of Shell command completed <<<*

And the execution fails.
Please Note: If I execute a normal R script (like containing just a print
statement), it works fine from Oozie shell. But if it is an RMR script,
then it throws the above error for any MapReduce code.

Can someone explain me that what is the right way of executing an RMR
script from Oozie?

Thanks,
Gaurav

Search Discussions

  • Abraham Elmahrek at Mar 13, 2013 at 6:41 am
    Gaurav,

    The error below looks like what ever the R code is executing relies on
    some log4j appender called TLA. Make sure that is in '
    /etc/hadoop/conf/log4j.properties'. Take a look at
    https://github.com/apurtell/ec2-demo/blob/master/bin/image/tarball/setup-remote
    to see how its being setup I guess.

    Looping in CDH-User@. to see if any one else knows. You may also want to
    check with the RHadoop guys on their mailing list.

    -Abe
    On 3/12/13 11:20 PM, gdsayshi@gmail.com wrote:
    HI Group,

    I want to know that I can I execute an R MapReduce code (say
    WordCount.R) from Oozie. I tried creating a simple workflow from Oozie
    Editor/Dashboard -> Workflows. There I created a ShellAction and gave
    the Path to Execute Shell Command (Path of my WordCount.R in HDFS) and
    in Add Files (Same HDFS path of WordCount.R).

    After saving, when I submit the workflow, it gives me the following error:

    />>> Invoking Shell command line now >>
    Stdoutput log4j:ERROR Could not find
    value for key log4j.appender.TLA
    Stdoutput log4j:ERROR Could not
    instantiate appender named "TLA".
    Stdoutput packageJobJar:
    [/tmp/RtmpnscKd7/rmr-local-env6d124390c901,
    /tmp/RtmpnscKd7/rmr-global-env6d126e1cf2f8,
    /tmp/RtmpnscKd7/rmr-streaming-map6d12396c8c9e,
    /tmp/RtmpnscKd7/rmr-streaming-reduce6d127bccd0ee,
    /tmp/RtmpnscKd7/rmr-streaming-combine6d127e57be8f,
    /tmp/hadoop-mapred/hadoop-unjar8376146431259627088/] []
    /tmp/streamjob3260548049415293781.jar tmpDir=null
    Stdoutput log4j:WARN No appenders could
    be found for logger (org.apache.hadoop.security.UserGroupInformation).
    Stdoutput log4j:WARN Please initialize
    the log4j system properly.
    Stdoutput log4j:WARN See
    http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    Stdoutput Streaming Command Failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<</

    And the execution fails.
    Please Note: If I execute a normal R script (like containing just a
    print statement), it works fine from Oozie shell. But if it is an RMR
    script, then it throws the above error for any MapReduce code.

    Can someone explain me that what is the right way of executing an RMR
    script from Oozie?

    Thanks,
    Gaurav
  • Gaurav Dasgupta at Mar 13, 2013 at 11:54 am
    HI,

    I have added the log4j properties in the file accross the cluster and then
    re-submitting the RMR job gives the following error log this time:

    *>>> Invoking Shell command line now >>
    Stdoutput packageJobJar: [/tmp/RtmpvtK0yJ/rmr-local-env4aea19a74b34,
    /tmp/RtmpvtK0yJ/rmr-global-env4aea54d99d2b,
    /tmp/RtmpvtK0yJ/rmr-streaming-map4aeaf8fe2cd,
    /tmp/RtmpvtK0yJ/rmr-streaming-reduce4aea30f2670a,
    /tmp/RtmpvtK0yJ/rmr-streaming-combine4aea5b333510,
    /tmp/hadoop-mapred/hadoop-unjar3304276294857728715/] []
    /tmp/streamjob7267683326241104931.jar tmpDir=null
    Stdoutput Streaming Command Failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<<*

    Any idea whats going wrong?
    Please note: I can run the same RMR job from terminal manually and it runs
    sucessfully.

    Thanks,
    Gaurav
  • Abraham Elmahrek at Mar 13, 2013 at 5:25 pm
    Hey Gaurav,

    Take a look at the oozie logs and MapReduce jobtracker and tasktracker
    logs. There should be more information in there.

    -Abe
    On 3/13/13 4:54 AM, Gaurav Dasgupta wrote:
    HI,

    I have added the log4j properties in the file accross the cluster and
    then re-submitting the RMR job gives the following error log this time:

    />>> Invoking Shell command line now >>
    Stdoutput packageJobJar: [/tmp/RtmpvtK0yJ/rmr-local-env4aea19a74b34,
    /tmp/RtmpvtK0yJ/rmr-global-env4aea54d99d2b,
    /tmp/RtmpvtK0yJ/rmr-streaming-map4aeaf8fe2cd,
    /tmp/RtmpvtK0yJ/rmr-streaming-reduce4aea30f2670a,
    /tmp/RtmpvtK0yJ/rmr-streaming-combine4aea5b333510,
    /tmp/hadoop-mapred/hadoop-unjar3304276294857728715/] []
    /tmp/streamjob7267683326241104931.jar tmpDir=null
    Stdoutput Streaming Command Failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<</

    Any idea whats going wrong?
    Please note: I can run the same RMR job from terminal manually and it
    runs sucessfully.

    Thanks,
    Gaurav
  • Romain Rigaux at Mar 13, 2013 at 6:29 pm
    The shell command is executed as the 'mapred' user in a non secure cluster.

    Does it work on the CLI when you run it as 'mapred'? e.g. 'sudo -u mapred
    hadoop ....'

    As it spawns a new MR job you should have more information while it is
    failing in the job just above 'oozie:launcher:T=shell:...' in JobBrowser.
    Just click on the log icon in the 'Logs' tab to see the logs.

    Romain

    On Wed, Mar 13, 2013 at 10:26 AM, Abraham Elmahrek wrote:

    Hey Gaurav,

    Take a look at the oozie logs and MapReduce jobtracker and tasktracker
    logs. There should be more information in there.

    -Abe


    On 3/13/13 4:54 AM, Gaurav Dasgupta wrote:

    HI,

    I have added the log4j properties in the file accross the cluster and then
    re-submitting the RMR job gives the following error log this time:

    *>>> Invoking Shell command line now >>
    Stdoutput packageJobJar: [/tmp/RtmpvtK0yJ/rmr-local-env4aea19a74b34,
    /tmp/RtmpvtK0yJ/rmr-global-env4aea54d99d2b,
    /tmp/RtmpvtK0yJ/rmr-streaming-map4aeaf8fe2cd,
    /tmp/RtmpvtK0yJ/rmr-streaming-reduce4aea30f2670a,
    /tmp/RtmpvtK0yJ/rmr-streaming-combine4aea5b333510,
    /tmp/hadoop-mapred/hadoop-unjar3304276294857728715/] []
    /tmp/streamjob7267683326241104931.jar tmpDir=null
    Stdoutput Streaming Command Failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<<*

    Any idea whats going wrong?
    Please note: I can run the same RMR job from terminal manually and it runs
    sucessfully.

    Thanks,
    Gaurav

  • Romain Rigaux at Mar 13, 2013 at 6:40 pm
    You can also probably run it as a streaming action instead of shell (for
    input/output parameters:
    https://issues.cloudera.org/browse/HUE-1035?focusedCommentId=15833&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15833
    )

    Romain
    On Wed, Mar 13, 2013 at 11:28 AM, Romain Rigaux wrote:

    The shell command is executed as the 'mapred' user in a non secure cluster.

    Does it work on the CLI when you run it as 'mapred'? e.g. 'sudo -u mapred
    hadoop ....'

    As it spawns a new MR job you should have more information while it is
    failing in the job just above 'oozie:launcher:T=shell:...' in JobBrowser.
    Just click on the log icon in the 'Logs' tab to see the logs.

    Romain


    On Wed, Mar 13, 2013 at 10:26 AM, Abraham Elmahrek wrote:

    Hey Gaurav,

    Take a look at the oozie logs and MapReduce jobtracker and tasktracker
    logs. There should be more information in there.

    -Abe


    On 3/13/13 4:54 AM, Gaurav Dasgupta wrote:

    HI,

    I have added the log4j properties in the file accross the cluster and
    then re-submitting the RMR job gives the following error log this time:

    *>>> Invoking Shell command line now >>
    Stdoutput packageJobJar: [/tmp/RtmpvtK0yJ/rmr-local-env4aea19a74b34,
    /tmp/RtmpvtK0yJ/rmr-global-env4aea54d99d2b,
    /tmp/RtmpvtK0yJ/rmr-streaming-map4aeaf8fe2cd,
    /tmp/RtmpvtK0yJ/rmr-streaming-reduce4aea30f2670a,
    /tmp/RtmpvtK0yJ/rmr-streaming-combine4aea5b333510,
    /tmp/hadoop-mapred/hadoop-unjar3304276294857728715/] []
    /tmp/streamjob7267683326241104931.jar tmpDir=null
    Stdoutput Streaming Command Failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<<*

    Any idea whats going wrong?
    Please note: I can run the same RMR job from terminal manually and it
    runs sucessfully.

    Thanks,
    Gaurav

  • Romain Rigaux at Mar 13, 2013 at 6:47 pm
    Sorry for the spam but you would need to ship the Hadoop streaming jar if
    you use the 'Shell' action. In the case of the 'Streaming' action you would
    need to install the Oozie sharelib:
    https://ccp.cloudera.com/display/CDH4DOC/Oozie+Installation#OozieInstallation-InstallShareLib

    Romain
    On Wed, Mar 13, 2013 at 11:40 AM, Romain Rigaux wrote:

    You can also probably run it as a streaming action instead of shell (for
    input/output parameters:
    https://issues.cloudera.org/browse/HUE-1035?focusedCommentId=15833&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15833
    )

    Romain

    On Wed, Mar 13, 2013 at 11:28 AM, Romain Rigaux wrote:

    The shell command is executed as the 'mapred' user in a non secure
    cluster.

    Does it work on the CLI when you run it as 'mapred'? e.g. 'sudo -u mapred
    hadoop ....'

    As it spawns a new MR job you should have more information while it is
    failing in the job just above 'oozie:launcher:T=shell:...' in
    JobBrowser. Just click on the log icon in the 'Logs' tab to see the logs.

    Romain


    On Wed, Mar 13, 2013 at 10:26 AM, Abraham Elmahrek wrote:

    Hey Gaurav,

    Take a look at the oozie logs and MapReduce jobtracker and tasktracker
    logs. There should be more information in there.

    -Abe


    On 3/13/13 4:54 AM, Gaurav Dasgupta wrote:

    HI,

    I have added the log4j properties in the file accross the cluster and
    then re-submitting the RMR job gives the following error log this time:

    *>>> Invoking Shell command line now >>
    Stdoutput packageJobJar: [/tmp/RtmpvtK0yJ/rmr-local-env4aea19a74b34,
    /tmp/RtmpvtK0yJ/rmr-global-env4aea54d99d2b,
    /tmp/RtmpvtK0yJ/rmr-streaming-map4aeaf8fe2cd,
    /tmp/RtmpvtK0yJ/rmr-streaming-reduce4aea30f2670a,
    /tmp/RtmpvtK0yJ/rmr-streaming-combine4aea5b333510,
    /tmp/hadoop-mapred/hadoop-unjar3304276294857728715/] []
    /tmp/streamjob7267683326241104931.jar tmpDir=null
    Stdoutput Streaming Command Failed!
    Exit code of the Shell command 1
    <<< Invocation of Shell command completed <<<*

    Any idea whats going wrong?
    Please note: I can run the same RMR job from terminal manually and it
    runs sucessfully.

    Thanks,
    Gaurav

  • Gaurav Dasgupta at Mar 14, 2013 at 7:33 am
    HI All,

    Thanks for all the replies. The issue got solved.
    It was a permission issue (mapred was unable to to write to "/user" HDFS
    directory). This was because, I think, I was running the job using "some
    user" and the staging directory is set to "/user/${user.name}".

    But the output for the job was set to "/tmp/wc-output" and it generated the
    output there. What was mapred trying to write in "/user" when submitting as
    Oozie shell? THis doesn't happen if I run the job as shell command simply
    from terminal.

    Thanks,
    Gaurav
  • Romain Rigaux at Mar 14, 2013 at 6:42 pm
    Oozie uses this directory as a temporary dir when you submit a job, e.g.:
    /user/bob/oozie-oozi

    When a job is running, you can see temporary data. When the job is
    finished, the content is deleted.

    Romain
    On Thu, Mar 14, 2013 at 12:33 AM, Gaurav Dasgupta wrote:

    HI All,

    Thanks for all the replies. The issue got solved.
    It was a permission issue (mapred was unable to to write to "/user" HDFS
    directory). This was because, I think, I was running the job using "some
    user" and the staging directory is set to "/user/${user.name}".

    But the output for the job was set to "/tmp/wc-output" and it generated
    the output there. What was mapred trying to write in "/user" when
    submitting as Oozie shell? THis doesn't happen if I run the job as shell
    command simply from terminal.

    Thanks,
    Gaurav

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphue-user @
categorieshadoop
postedMar 13, '13 at 6:20a
activeMar 14, '13 at 6:42p
posts9
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase