Hi,
@Ted, below code is internal code. Users are not expected to call DistributedCache.getLocalCache(), they cannot use it also. They do not know all the parameters.
@Larry, DistributedCache is not changed to use new api in branch 0.20. The change is done in only from branch 0.21. See MAPREDUCE-898 (
https://issues.apache.org/jira/browse/MAPREDUCE-898).If you are using branch 0.20, you are encouraged to use deprecated JobConf itself.
You can try the following change in your code:
Change the line > > > DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf);
to DistributedCache.addCacheFile(new Path(args[0]).toUri(), job.getConfiguration());
Thanks
Amareshwari
On 4/16/10 2:27 AM, "Ted Yu" wrote:
Please take a look at the loop starting at line 158 in TaskRunner.java:
p[i] = DistributedCache.getLocalCache(files[i], conf,
new Path(baseDir),
fileStatus,
false, Long.parseLong(
fileTimestamps[i]),
new Path(workDir.
getAbsolutePath()),
false);
}
DistributedCache.setLocalFiles(conf, stringifyPathArray(p));
I think the confusing part is that DistributedCache.getLocalCacheFiles() is
paired with DistributedCache.setLocalFiles()
Cheers
On Thu, Apr 15, 2010 at 1:16 PM, Larry Compton
wrote:
Ted,
Thanks. I have looked at that example. The javadocs for DistributedCache
still refer to deprecated classes, like JobConf. I'm trying to use the
revised API.
Larry
On Thu, Apr 15, 2010 at 4:07 PM, Ted Yu wrote:Please see the sample within
src\core\org\apache\hadoop\filecache\DistributedCache.java:
* JobConf job = new JobConf();
* DistributedCache.addCacheFile(new
URI("/myapp/lookup.dat#lookup.dat"),
* job);
On Thu, Apr 15, 2010 at 12:56 PM, Larry Compton
wrote:
I'm trying to use the distributed cache in a MapReduce job written to
the
new API (org.apache.hadoop.mapreduce.*). In my "Tool" class, a file
path
is
added to the distributed cache as follows:
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, "Job");
...
DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf);
...
return job.waitForCompletion(true) ? 0 : 1;
}
The "setup()" method in my mapper tries to read the path as follows:
protected void setup(Context context) throws IOException {
Path[] paths = DistributedCache.getLocalCacheFiles(context
.getConfiguration());
}
But "paths" is null.
I'm assuming I'm setting up the distributed cache incorrectly. I've
seen
a
few hints in previous mailing list postings that indicate that the
distributed cache is accessed via the Job and JobContext objects in the
revised API, but the javadocs don't seem to support that.
Thanks.
Larry