FAQ
I'm fancied about passing a whole ruby app to streaming, so I don't need to
bother with ruby file dependencies.

For example,

./streaming

...
-mapper 'ruby aaa/bbb/ccc'
-files aaa <--- pass the folder




Is this supported already? If not, any tips on how to make this work? I'm
willing to add some code by myself and rebuild the streaming jar.

Search Discussions

  • Nick Jones at Jun 28, 2011 at 4:26 pm
    Take a look at Wukong from the guys at Infochimps:
    https://github.com/mrflip/wukong
    On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:
    I'm fancied about passing a whole ruby app to streaming, so I don't need to
    bother with ruby file dependencies.

    For example,

    ./streaming

    ...
    -mapper 'ruby aaa/bbb/ccc'
    -files aaa<--- pass the folder




    Is this supported already? If not, any tips on how to make this work? I'm
    willing to add some code by myself and rebuild the streaming jar.
    --
    Nick Jones
  • Abhinay Mehta at Jun 28, 2011 at 4:36 pm
    We use Mandy: https://github.com/forward/mandy for this.

    On 28 June 2011 17:26, Nick Jones wrote:

    Take a look at Wukong from the guys at Infochimps:
    https://github.com/mrflip/**wukong <https://github.com/mrflip/wukong>

    On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:

    I'm fancied about passing a whole ruby app to streaming, so I don't need
    to
    bother with ruby file dependencies.

    For example,

    ./streaming

    ...
    -mapper 'ruby aaa/bbb/ccc'
    -files aaa<--- pass the folder




    Is this supported already? If not, any tips on how to make this work? I'm
    willing to add some code by myself and rebuild the streaming jar.
    --
    Nick Jones

  • Guang-Nan Cheng at Jun 29, 2011 at 6:45 am
    Well, my bad. I made a simple test and confirmed that -files works that way
    already.

    For the two guys that "answered" my question, sorry I asked the question
    unclearly... I don't see how those two projects related to the question,
    but thank you. :D



    On Wed, Jun 29, 2011 at 12:35 AM, Abhinay Mehta wrote:

    We use Mandy: https://github.com/forward/mandy for this.

    On 28 June 2011 17:26, Nick Jones wrote:

    Take a look at Wukong from the guys at Infochimps:
    https://github.com/mrflip/**wukong <https://github.com/mrflip/wukong>

    On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:

    I'm fancied about passing a whole ruby app to streaming, so I don't need
    to
    bother with ruby file dependencies.

    For example,

    ./streaming

    ...
    -mapper 'ruby aaa/bbb/ccc'
    -files aaa<--- pass the folder




    Is this supported already? If not, any tips on how to make this work?
    I'm
    willing to add some code by myself and rebuild the streaming jar.
    --
    Nick Jones

  • Paul Ingles at Jun 29, 2011 at 7:00 am
    Hi,

    I'm not familiar with wukong, but Mandy has some scripts that wrap the hadoop commands- the default behaviour IIRC is to package the folder the script is in.

    This is then distributed so the app carries all its dependencies with it.

    Happy to hear -files works for you.

    Sent from my iPhone
    On 29 Jun 2011, at 07:44, Guang-Nan Cheng wrote:

    Well, my bad. I made a simple test and confirmed that -files works that way
    already.

    For the two guys that "answered" my question, sorry I asked the question
    unclearly... I don't see how those two projects related to the question,
    but thank you. :D



    On Wed, Jun 29, 2011 at 12:35 AM, Abhinay Mehta wrote:

    We use Mandy: https://github.com/forward/mandy for this.

    On 28 June 2011 17:26, Nick Jones wrote:

    Take a look at Wukong from the guys at Infochimps:
    https://github.com/mrflip/**wukong <https://github.com/mrflip/wukong>

    On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:

    I'm fancied about passing a whole ruby app to streaming, so I don't need
    to
    bother with ruby file dependencies.

    For example,

    ./streaming

    ...
    -mapper 'ruby aaa/bbb/ccc'
    -files aaa<--- pass the folder




    Is this supported already? If not, any tips on how to make this work?
    I'm
    willing to add some code by myself and rebuild the streaming jar.
    --
    Nick Jones

  • Shi Yu at Apr 10, 2012 at 9:59 pm
    Hi,

    I looked back to the old post trying to find out a solution to my
    problem. I am using hadoop 0.20.203 streaming for a C++ program. The
    program loads many dictionaries stored in local folders. For example,

    mainfolder - dir1 -> dicfile 1
    mainfolder - dir1 -> dicfile 2
    mainfolder - dir2 -> dicfile 3
    mainfolder - dir2 -> dicfile 4

    I didn't change those dictionary loading functions in C++ based on the
    assumption that the whole directory at mainfolder level could be passed
    to streaming. However, it seems not working well cause I observed the
    following error:

    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)


    It seems the program failed to load the dictionaries. What is the most
    efficient way to do pass multiple files with directory dependencies in
    hadoop streaming? I guess I don't need to change the C++ code, or
    should I remove all the directory dependencies in dictionary loading?

    Thanks!

    Shi
    On 6/29/2011 1:44 AM, Guang-Nan Cheng wrote:
    Well, my bad. I made a simple test and confirmed that -files works that way
    already.

    On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:
    I'm fancied about passing a whole ruby app to streaming, so I don't need
    to
    bother with ruby file dependencies.

    For example,

    ./streaming

    ...
    -mapper 'ruby aaa/bbb/ccc'
    -files aaa<--- pass the folder




    Is this supported already? If not, any tips on how to make this work?
    I'm
    willing to add some code by myself and rebuild the streaming jar.
    --
    Nick Jones

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 28, '11 at 4:20p
activeApr 10, '12 at 9:59p
posts6
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase