|
Michael Segel |
at Jul 30, 2011 at 2:50 am
|
⇧ |
| |
Here's the meat of my post earlier...
Sample code on putting a file on the cache:
DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));
Sample code in pulling data off the cache:
private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
boolean exitProcess = false;
int i=0;
while (!exit){
fileName = localFiles[i].getName();
if (fileName.equalsIgnoreCase("model.txt")){
// Build your input file reader on localFiles[i].toString()
exitProcess = true;
}
i++;
}
Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".
When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.
HTH
-Mike
From: michael_segel@hotmail.com
To: common-user@hadoop.apache.org
Subject: RE: Moving Files to Distributed Cache in MapReduce
Date: Fri, 29 Jul 2011 21:43:37 -0500
I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.
Date: Fri, 29 Jul 2011 14:51:26 -0700
Subject: Re: Moving Files to Distributed Cache in MapReduce
From: rogchen@ucdavis.edu
To: common-user@hadoop.apache.org
jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
Configuration for that
On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:Is this what you are looking for?
http://hadoop.apache.org/common/docs/current/mapred_tutorial.htmlsearch for jobConf
On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:Thanks for the response! However, I'm having an issue with this line
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
because conf has private access in org.apache.hadoop.configured
On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
wrote:
I hope my previous reply helps...
On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
wrote:
After moving it to the distributed cache, how would I call it within
my
MapReduce program?
On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
mapred.learn@gmail.com
wrote:
Did you try using -files option in your hadoop jar command as:
/usr/bin/hadoop jar <jar name> <main class name> -files <absolute
path
of
file to be added to distributed cache> <input dir> <output dir>
On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
wrote:
Slight modification: I now know how to add files to the
distributed
file
cache, which can be done via this command placed in the main or
run
class:
DistributedCache.addCacheFile(new
URI("/user/hadoop/thefile.dat"),
conf);
However I am still having trouble locating the file in the
distributed
cache. *How do I call the file path of thefile.dat in the
distributed
cache
as a string?* I am using Hadoop 0.20.2
On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
wrote:
Hi all,
Does anybody have examples of how one moves files from the local
filestructure/HDFS to the distributed cache in MapReduce? A
Google
search
turned up examples in Pig but not MR.
--
Roger Chen
UC Davis Genome Center
--
Roger Chen
UC Davis Genome Center
--
Roger Chen
UC Davis Genome Center
--
Roger Chen
UC Davis Genome Center
--
Roger Chen
UC Davis Genome Center