FAQ
What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?

This isn't clear to me from documentation and books, so I did some
experimenting. Here's the conclusion I came to: the paths in
HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
not added to the class path of the Task Trackers. Therefore if you put a JAR
called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
available to the Task Trackers as well, calls to MyJar.jar code from the
run() method of your job work, but calls from your Mapper or Reducer will
fail at runtime. Is this correct?

If it is, what is the proper way to make MyJar.jar available to both the Job
Client and the Task Trackers?

Search Discussions

  • John Armstrong at Aug 22, 2011 at 6:06 pm

    On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" wrote:
    If it is, what is the proper way to make MyJar.jar available to both the
    Job
    Client and the Task Trackers?
    Do you mean the task trackers, or the tasks themselves? What process do
    you want to be able to run the code in MyJar.jar?
  • W.P. McNeill at Aug 22, 2011 at 6:49 pm
    I meant tasks running on the Task Trackers.

    Harsh J.'s answer is what I needed. This makes sense now.
    On Mon, Aug 22, 2011 at 11:06 AM, John Armstrong wrote:
    On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" wrote:
    If it is, what is the proper way to make MyJar.jar available to both the
    Job
    Client and the Task Trackers?
    Do you mean the task trackers, or the tasks themselves? What process do
    you want to be able to run the code in MyJar.jar?
  • Harsh J at Aug 22, 2011 at 6:18 pm

    On Mon, Aug 22, 2011 at 11:31 PM, W.P. McNeill wrote:
    What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?

    This isn't clear to me from documentation and books, so I did some
    experimenting. Here's the conclusion I came to: the paths in
    HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
    not added to the class path of the Task Trackers. Therefore if you put a JAR
    called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
    available to the Task Trackers as well, calls to MyJar.jar code from the
    run() method of your job work, but calls from your Mapper or Reducer will
    fail at runtime. Is this correct?
    Yes, this is right.
    If it is, what is the proper way to make MyJar.jar available to both the Job
    Client and the Task Trackers?
    You'll need to use the Distributed Cache. Or you'd need to start the
    TaskTrackers with the library on their classpath (which copies over to
    launched task JVMs). The latter way is rigid/inflexible when it comes
    to jar versioning.

    --
    Harsh J
  • GOEKE, MATTHEW (AG/1000) at Aug 22, 2011 at 6:20 pm
    If you are asking how to make those classes available at run time you can either use the -libjars command for the distributed cache or you can just shade those classes into your jar using maven. I have had enough issues in the past with classpath being flaky that I prefer the shading method but obviously that is not the preferred route.

    Matt

    -----Original Message-----
    From: W.P. McNeill
    Sent: Monday, August 22, 2011 1:01 PM
    To: common-user@hadoop.apache.org
    Subject: Making sure I understand HADOOP_CLASSPATH

    What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?

    This isn't clear to me from documentation and books, so I did some
    experimenting. Here's the conclusion I came to: the paths in
    HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
    not added to the class path of the Task Trackers. Therefore if you put a JAR
    called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
    available to the Task Trackers as well, calls to MyJar.jar code from the
    run() method of your job work, but calls from your Mapper or Reducer will
    fail at runtime. Is this correct?

    If it is, what is the proper way to make MyJar.jar available to both the Job
    Client and the Task Trackers?
    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 22, '11 at 6:01p
activeAug 22, '11 at 6:49p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase