FAQ
Hi all,

Does anybody have examples of how one moves files from the local
filestructure/HDFS to the distributed cache in MapReduce? A Google search
turned up examples in Pig but not MR.

--
Roger Chen
UC Davis Genome Center

Search Discussions

  • Roger Chen at Jul 29, 2011 at 6:06 pm
    Slight modification: I now know how to add files to the distributed file
    cache, which can be done via this command placed in the main or run class:

    DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the distributed
    cache. *How do I call the file path of thefile.dat in the distributed cache
    as a string?* I am using Hadoop 0.20.2

    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen wrote:

    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Mapred Learn at Jul 29, 2011 at 6:09 pm
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of
    file to be added to distributed cache> <input dir> <output dir>

    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen wrote:

    Slight modification: I now know how to add files to the distributed file
    cache, which can be done via this command placed in the main or run class:

    DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the distributed
    cache. *How do I call the file path of thefile.dat in the distributed cache
    as a string?* I am using Hadoop 0.20.2

    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen wrote:

    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Roger Chen at Jul 29, 2011 at 6:11 pm
    After moving it to the distributed cache, how would I call it within my
    MapReduce program?
    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn wrote:

    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of
    file to be added to distributed cache> <input dir> <output dir>

    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen wrote:

    Slight modification: I now know how to add files to the distributed file
    cache, which can be done via this command placed in the main or run class:
    DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the distributed
    cache. *How do I call the file path of thefile.dat in the distributed cache
    as a string?* I am using Hadoop 0.20.2

    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen wrote:

    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Mapred Learn at Jul 29, 2011 at 6:19 pm
    I hope my previous reply helps...
    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen wrote:

    After moving it to the distributed cache, how would I call it within my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of
    file to be added to distributed cache> <input dir> <output dir>

    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen wrote:

    Slight modification: I now know how to add files to the distributed
    file
    cache, which can be done via this command placed in the main or run class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the distributed
    cache. *How do I call the file path of thefile.dat in the distributed cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Arindam Khaled at Jul 29, 2011 at 6:56 pm
    Please unsubscribe me.
    On Jul 29, 2011, at 1:18 PM, Mapred Learn wrote:

    I hope my previous reply helps...
    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen wrote:

    After moving it to the distributed cache, how would I call it
    within my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn
    <mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the distributed
    file
    cache, which can be done via this command placed in the main or run class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Roger Chen at Jul 29, 2011 at 8:51 pm
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured
    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn wrote:

    I hope my previous reply helps...
    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen wrote:

    After moving it to the distributed cache, how would I call it within my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the distributed
    file
    cache, which can be done via this command placed in the main or run class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the distributed cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Mohit Anchlia at Jul 29, 2011 at 8:59 pm
    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured
    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn wrote:

    I hope my previous reply helps...
    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen wrote:

    After moving it to the distributed cache, how would I call it within my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files  <absolute path of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the distributed
    file
    cache, which can be done via this command placed in the main or run class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the distributed cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Roger Chen at Jul 29, 2011 at 9:51 pm
    jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
    Configuration for that
    On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:

    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured

    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    I hope my previous reply helps...
    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen wrote:

    After moving it to the distributed cache, how would I call it within
    my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
    mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path
    of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the
    distributed
    file
    cache, which can be done via this command placed in the main or
    run
    class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed
    cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A
    Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Roger Chen at Jul 29, 2011 at 11:22 pm
    Hi all, I have now resolved my issue by doing a try/catch statement. Thanks
    for all the help!
    On Fri, Jul 29, 2011 at 2:51 PM, Roger Chen wrote:

    jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
    Configuration for that

    On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:

    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured

    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    I hope my previous reply helps...

    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    After moving it to the distributed cache, how would I call it within
    my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
    mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path
    of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the
    distributed
    file
    cache, which can be done via this command placed in the main or
    run
    class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed
    cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <
    rogchen@ucdavis.edu>
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the
    local
    filestructure/HDFS to the distributed cache in MapReduce? A
    Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Michael Segel at Jul 30, 2011 at 2:44 am
    I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.

    Date: Fri, 29 Jul 2011 14:51:26 -0700
    Subject: Re: Moving Files to Distributed Cache in MapReduce
    From: rogchen@ucdavis.edu
    To: common-user@hadoop.apache.org

    jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
    Configuration for that
    On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:

    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured

    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    I hope my previous reply helps...

    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    After moving it to the distributed cache, how would I call it within
    my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
    mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path
    of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the
    distributed
    file
    cache, which can be done via this command placed in the main or
    run
    class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed
    cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A
    Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Michael Segel at Jul 30, 2011 at 2:50 am
    Here's the meat of my post earlier...
    Sample code on putting a file on the cache:
    DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));

    Sample code in pulling data off the cache:
    private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
    boolean exitProcess = false;
    int i=0;
    while (!exit){
    fileName = localFiles[i].getName();
    if (fileName.equalsIgnoreCase("model.txt")){
    // Build your input file reader on localFiles[i].toString()
    exitProcess = true;
    }
    i++;
    }


    Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
    Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".

    When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.

    HTH

    -Mike

    From: michael_segel@hotmail.com
    To: common-user@hadoop.apache.org
    Subject: RE: Moving Files to Distributed Cache in MapReduce
    Date: Fri, 29 Jul 2011 21:43:37 -0500


    I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.

    Date: Fri, 29 Jul 2011 14:51:26 -0700
    Subject: Re: Moving Files to Distributed Cache in MapReduce
    From: rogchen@ucdavis.edu
    To: common-user@hadoop.apache.org

    jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
    Configuration for that
    On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:

    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured

    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    I hope my previous reply helps...

    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    After moving it to the distributed cache, how would I call it within
    my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
    mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path
    of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the
    distributed
    file
    cache, which can be done via this command placed in the main or
    run
    class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed
    cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A
    Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Allen Wittenauer at Aug 1, 2011 at 2:21 am
    We really need to build a working example to the wiki and add a link from the FAQ page. Any volunteers?
    On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:


    Here's the meat of my post earlier...
    Sample code on putting a file on the cache:
    DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));

    Sample code in pulling data off the cache:
    private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
    boolean exitProcess = false;
    int i=0;
    while (!exit){
    fileName = localFiles[i].getName();
    if (fileName.equalsIgnoreCase("model.txt")){
    // Build your input file reader on localFiles[i].toString()
    exitProcess = true;
    }
    i++;
    }


    Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
    Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".

    When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.

    HTH

    -Mike

    From: michael_segel@hotmail.com
    To: common-user@hadoop.apache.org
    Subject: RE: Moving Files to Distributed Cache in MapReduce
    Date: Fri, 29 Jul 2011 21:43:37 -0500


    I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.

    Date: Fri, 29 Jul 2011 14:51:26 -0700
    Subject: Re: Moving Files to Distributed Cache in MapReduce
    From: rogchen@ucdavis.edu
    To: common-user@hadoop.apache.org

    jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
    Configuration for that
    On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:

    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured

    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    I hope my previous reply helps...

    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    After moving it to the distributed cache, how would I call it within
    my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
    mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path
    of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the
    distributed
    file
    cache, which can be done via this command placed in the main or
    run
    class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed
    cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A
    Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Michael Segel at Aug 1, 2011 at 12:25 pm
    Yeah,

    I'll write something up and post it on my web site. Definitely not InfoQ stuff, but a simple tip and tricks stuff.

    -Mike

    Subject: Re: Moving Files to Distributed Cache in MapReduce
    From: aw@apache.org
    Date: Sun, 31 Jul 2011 19:21:14 -0700
    To: common-user@hadoop.apache.org


    We really need to build a working example to the wiki and add a link from the FAQ page. Any volunteers?
    On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:


    Here's the meat of my post earlier...
    Sample code on putting a file on the cache:
    DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));

    Sample code in pulling data off the cache:
    private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
    boolean exitProcess = false;
    int i=0;
    while (!exit){
    fileName = localFiles[i].getName();
    if (fileName.equalsIgnoreCase("model.txt")){
    // Build your input file reader on localFiles[i].toString()
    exitProcess = true;
    }
    i++;
    }


    Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
    Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".

    When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.

    HTH

    -Mike

    From: michael_segel@hotmail.com
    To: common-user@hadoop.apache.org
    Subject: RE: Moving Files to Distributed Cache in MapReduce
    Date: Fri, 29 Jul 2011 21:43:37 -0500


    I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.

    Date: Fri, 29 Jul 2011 14:51:26 -0700
    Subject: Re: Moving Files to Distributed Cache in MapReduce
    From: rogchen@ucdavis.edu
    To: common-user@hadoop.apache.org

    jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
    Configuration for that
    On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia wrote:

    Is this what you are looking for?

    http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

    search for jobConf
    On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen wrote:
    Thanks for the response! However, I'm having an issue with this line

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    because conf has private access in org.apache.hadoop.configured

    On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
    wrote:
    I hope my previous reply helps...

    On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    After moving it to the distributed cache, how would I call it within
    my
    MapReduce program?

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
    mapred.learn@gmail.com
    wrote:
    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute
    path
    of
    file to be added to distributed cache> <input dir> <output dir>


    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
    wrote:
    Slight modification: I now know how to add files to the
    distributed
    file
    cache, which can be done via this command placed in the main or
    run
    class:
    DistributedCache.addCacheFile(new
    URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the
    distributed
    cache. *How do I call the file path of thefile.dat in the
    distributed
    cache
    as a string?* I am using Hadoop 0.20.2


    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
    wrote:
    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A
    Google
    search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center
  • Mapred Learn at Jul 29, 2011 at 6:12 pm
    ok for accessing it in mapper code, u can do something like:

    On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn wrote:

    Did you try using -files option in your hadoop jar command as:

    /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of
    file to be added to distributed cache> <input dir> <output dir>

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

    String fileName="";
    for (Path p : cacheFiles) {

    if (p != null) {
    fileName = p.getName();
    }

    }
    On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen wrote:

    Slight modification: I now know how to add files to the distributed file
    cache, which can be done via this command placed in the main or run class:

    DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"),
    conf);

    However I am still having trouble locating the file in the distributed
    cache. *How do I call the file path of thefile.dat in the distributed
    cache
    as a string?* I am using Hadoop 0.20.2

    On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen wrote:

    Hi all,

    Does anybody have examples of how one moves files from the local
    filestructure/HDFS to the distributed cache in MapReduce? A Google search
    turned up examples in Pig but not MR.

    --
    Roger Chen
    UC Davis Genome Center


    --
    Roger Chen
    UC Davis Genome Center

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 29, '11 at 5:26p
activeAug 1, '11 at 12:25p
posts15
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase