Grokbase Groups Pig user April 2010
FAQ
Hi,

Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?

This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.

messages = LOAD 'msg.tsv'; -- msg.tsv is in HDFS
DEFINE message_to_words `words.rb` SHIP('words.rb'); -- words.rb is in my local computer
words = STREAM messages THROUGH message_to_words;
dump words;

However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:

copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb -- copy to local drive
cp s3://bucketname/words.rb words.rb -- copy to HDFS

DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb'); -- not working
DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb'); -- not working
DEFINE message_to_words `words.rb` SHIP('words.rb'); -- not working

Can anyone advice on the proper SHIP() syntax?

Thanks,
Chiew

Search Discussions

  • Rekha Joshi at Apr 6, 2010 at 6:21 am
    Not sure what is the error on pig logs when you say ship failed, so hazarding a guess..

    Assuming permissions are alright, sometimes the rb executable on compute node is not same as on local, so you might need ensure the ruby path.
    Also the file needs to be there locally to ship, if copyToLocal has not worked try hadoop fs -get. In the ship command, you may try giving fully-qualified local path to words.rb.,as

    DEFINE message_to_words `/path/to/ruby words.rb` SHIP('/home/Cflocalpath/words.rb');

    Cheers,
    /

    On 4/5/10 12:29 PM, "CF" wrote:

    Hi,

    Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?

    This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.

    messages = LOAD 'msg.tsv'; -- msg.tsv is in HDFS
    DEFINE message_to_words `words.rb` SHIP('words.rb'); -- words.rb is in my local computer
    words = STREAM messages THROUGH message_to_words;
    dump words;

    However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:

    copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb -- copy to local drive
    cp s3://bucketname/words.rb words.rb -- copy to HDFS

    DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb'); -- not working
    DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb'); -- not working
    DEFINE message_to_words `words.rb` SHIP('words.rb'); -- not working

    Can anyone advice on the proper SHIP() syntax?

    Thanks,
    Chiew
  • CF at Apr 7, 2010 at 2:57 am
    Hi Rekha,

    Thank you very much for your response. You've pointed me to the right direction.

    Just for reference, I've successfully shipped the ruby code on an Amazon Elastic MapReduce pig run by doing:

    messages = LOAD '$S3PATH/msg.tsv';
    cp $S3PATH/words.rb words.rb -- copy to HDFS first
    copyToLocal words.rb /home/hadoop/words.rb -- copy from HDFS into local amazon instance
    DEFINE message_to_words `ruby -Ku /home/hadoop/words.rb` SHIP('/home/hadoop/words.rb');
    words = STREAM messages THROUGH message_to_words;
    dump words;

    cheers,
    Chiew
    On Apr 6, 2010, at 3:20 PM, Rekha Joshi wrote:

    Not sure what is the error on pig logs when you say ship failed, so hazarding a guess..

    Assuming permissions are alright, sometimes the rb executable on compute node is not same as on local, so you might need ensure the ruby path.
    Also the file needs to be there locally to ship, if copyToLocal has not worked try hadoop fs -get. In the ship command, you may try giving fully-qualified local path to words.rb.,as

    DEFINE message_to_words `/path/to/ruby words.rb` SHIP('/home/Cflocalpath/words.rb');

    Cheers,
    /

    On 4/5/10 12:29 PM, "CF" wrote:

    Hi,

    Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?

    This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.

    messages = LOAD 'msg.tsv'; -- msg.tsv is in HDFS
    DEFINE message_to_words `words.rb` SHIP('words.rb'); -- words.rb is in my local computer
    words = STREAM messages THROUGH message_to_words;
    dump words;

    However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:

    copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb -- copy to local drive
    cp s3://bucketname/words.rb words.rb -- copy to HDFS

    DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb'); -- not working
    DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb'); -- not working
    DEFINE message_to_words `words.rb` SHIP('words.rb'); -- not working

    Can anyone advice on the proper SHIP() syntax?

    Thanks,
    Chiew

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 5, '10 at 6:59a
activeApr 7, '10 at 2:57a
posts3
users2
websitepig.apache.org

2 users in discussion

CF: 2 posts Rekha Joshi: 1 post

People

Translate

site design / logo © 2022 Grokbase