Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD?
This pig script runs fine on a single hadoop installation on my local computer. Note that the ruby program is sourced from my local HDD.
messages = LOAD 'msg.tsv'; -- msg.tsv is in HDFS
DEFINE message_to_words `words.rb` SHIP('words.rb'); -- words.rb is in my local computer
words = STREAM messages THROUGH message_to_words;
However, I am trying to run this on an Amazon MapReduce instance, which means that I either have to ship from S3 or from HDFS. None of these SHIP commands worked for me:
copyToLocal s3://bucketname/words.rb /home/hadoop/words.rb -- copy to local drive
cp s3://bucketname/words.rb words.rb -- copy to HDFS
DEFINE message_to_words `words.rb` SHIP('hdfs:///words.rb'); -- not working
DEFINE message_to_words `words.rb` SHIP('S3://bucketname/words.rb'); -- not working
DEFINE message_to_words `words.rb` SHIP('words.rb'); -- not working
Can anyone advice on the proper SHIP() syntax?