|| at Sep 2, 2009 at 3:43 pm
You only need to register the udf jar, and pig will help you distribute the
jar to cluster.
And each time you submit the pig script, pig will distribute udf jar to
From: zaki rahaman
Sent: 2009年9月2日 7:49
Subject: Re: UDFs and Amazon Elastic MapReduce
Apologies for re-posting, but I never got an answer to my question.
Basically, when using UDF jar files, how do you go about ensuring that the
jar file is replicated on all nodes on a cluster and that each node uses its
own local copy of the node and not the 'master' copy (to avoid unnecessary
network traffic and bandwidth issues)? It looks like this is accomplished
via a DEFINE + ship/cache statement but I'm not sure which one is necessary
On Fri, Aug 28, 2009 at 2:39 PM, zaki rahaman wrote:
I had a question about running Pig jobs on Amazon's cloud services.
Specifically, how do you go about adding UDF jar files and what, if any,
modifications to make to a script to make sure it runs effectively via
mapreduce (do you need to ship/cache the udf jar, and if so, how?)
Thanks for all the help so far,