I think the issue you are seeing is because the distributed cache is not set up by default to create symlinks to the files it pulls over. If you want to access them through symlinks in the local directory call DistributedCache.createSymklink(conf) before submitting your job, otherwise you can use getLocalCacheFiles and getLocalCacheArchives to know where the files are.
One thing to be aware of is that the cache archives and cache files format may optionally end with a #<name> where <name> is the name of the symlink you want on the compute node.
--Bobby Evans
On 6/7/11 8:52 AM, "John Armstrong" wrote:
On Tue, 7 Jun 2011 09:41:21 -0300, "Juan P." wrote:
Not 100% clear on what you meant. You are saying I should put the file into
my HDFS cluster or should I use DistributedCache? If you suggest the
latter,
could you address my original question?
I mean that you can certainly get away with putting information into a
known place on HDFS and loading it in each mapper or reducer, but that may
become very inefficient as your problem scales up. Mostly I was responding
to Shi Yu's question about why the DC is even worth using at all.
As to your question, here's how I do it, which I think I basically lifted
from an example in The Definitive Guide. There may be better ways, though.
In my setup, I put files into the DC by getting Path objects (which should
be able to reference either HDFS or local filesystem files, though I always
have my files on HDFS to start) and using
DistributedCache.addCacheFile(path.toUri(), conf);
Then within my mapper or reducer I retrieve all the cached files with
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
IIRC, this is what you were doing. The problem is this gets all the
cached files, although they are now in a working directory on the local
filesystem. Luckily, I know the filename of the file I want, so I iterate
for (Path cachePath : cacheFiles) {
if (cachePath.getName().equals(cachedFilename)) {
return cachePath;
}
}
Then I've got the path to the local filesystem copy of the file I want in
hand and I can do whatever I want with it.
hth