Many thanks, I'll try it.
From: Mridul Muralidharan
Sent: Wednesday, April 15, 2009 10:14 AM
Subject: Re: HDFS Distributed Cache with Pig
Using distributed cache is actually not that tough from pig.
First copy your file to hdfs - say it is copied as
hdfs://host:port/mypath/file.zip for a zip file.
Next, in your conf file, you need to define
When the job is run, this will create a symlink called 'my_location' in
your current working directory (of the udf) where you can access the
exploded contents of your zip file.
So in your udf, it is simple case of opening the paths under my_location
from current working directory to get to your relevant files.
Hope this helps.
Bill Habermaas wrote:
How does Hadoop distributed file cache work with Pig?
I have some data files and jars that are used by some UDFs I've written.
They work if I am executing Pig scripts on a local file system but fail with
exceptions because the jars and files are not found when running in
map/reduce mode on a HDFS cluster.