FAQ
Hello All,

I am new to Hadoop, and I am trying to use the GenericOptionsParser Class.
In particular, I would like to use the -libjar option to specify additional
jar files to include in the classpath. I've created a class that extends
Configured and Implements Tool:

*public class* OptionDemo *extends* Configured *implements* Tool

{

...

* public int* run(String[] args) *throws* Exception

{

Configuration conf = getConf();

GenericOptionsParser opts = *new* GenericOptionsParser(conf, args);

...

}

}


However, when I run my code the jar files that I include after -libjar
aren't being added to the classpath and I receive an error that certain
classes can't be found during the execution of my job.

The book Hadoop: The Definitive Guide states:

You don’t usually use GenericOptionsParser directly, as it’s more convenient
to implement the Tool interface and run your application with the
ToolRunner, which uses GenericOptionsParser internally:
public interface Tool extends Configurable {
int run(String [] args) throws Exception;
}

but it still isn't clear to me how the -libjars option is parsed, whether or
not I need to explicitly add it to the classpath inside my run method, or
where it needs to be placed in the command-line? Any advice or sample code
on using -libjar would greatly be appreciated.

--
Aquil H. Abdullah
aquil.abdullah@gmail.com

Search Discussions

  • John Armstrong at Aug 1, 2011 at 4:23 pm

    On Mon, 1 Aug 2011 12:11:27 -0400, "Aquil H. Abdullah" wrote:
    but it still isn't clear to me how the -libjars option is parsed, whether
    or
    not I need to explicitly add it to the classpath inside my run method, or
    where it needs to be placed in the command-line?
    IIRC it's parsed as a comma-separated list of file paths relative to your
    current working directory, and the local copies that it makes on each
    cluster node are automatically added to the tasks' classpaths.

    Can you give an example of how you're trying to use it?
  • Harsh J at Aug 1, 2011 at 4:57 pm
    Aquil,

    On a side-note, if you use Tool, GenericOptsParser is automatically
    used internally (by ToolRunner), so you don't have to re-parse your
    args in your run(…) method. What you get as run(args) are the remnant
    args alone, if your application handles any.

    Would help, as John pointed out, if you could give your exact,
    invoking CLI command.

    On Mon, Aug 1, 2011 at 9:41 PM, Aquil H. Abdullah
    wrote:
    Hello All,

    I am new to Hadoop, and I am trying to use the GenericOptionsParser Class.
    In particular, I would like to use the -libjar option to specify additional
    jar files to include in the classpath. I've created a class that extends
    Configured and Implements Tool:

    *public class* OptionDemo *extends* Configured *implements* Tool

    {

    ...

    *    public int* run(String[] args) *throws* Exception

    {

    Configuration conf = getConf();

    GenericOptionsParser opts = *new* GenericOptionsParser(conf, args);

    ...

    }

    }


    However, when I run my code the jar files that I include after -libjar
    aren't being added to the classpath and I receive an error that certain
    classes can't be found during the execution of my job.

    The book Hadoop: The Definitive Guide states:

    You don’t usually use GenericOptionsParser directly, as it’s more convenient
    to implement the Tool interface and run your application with the
    ToolRunner, which uses GenericOptionsParser internally:
    public interface Tool extends Configurable {
    int run(String [] args) throws Exception;
    }

    but it still isn't clear to me how the -libjars option is parsed, whether or
    not I need to explicitly add it to the classpath inside my run method, or
    where it needs to be placed in the command-line? Any advice or sample code
    on using -libjar would greatly be appreciated.

    --
    Aquil H. Abdullah
    aquil.abdullah@gmail.com


    --
    Harsh J
  • Aquil H. Abdullah at Aug 1, 2011 at 5:21 pm
    [See Response Inline]

    I've tried invoking getLib
    On Mon, Aug 1, 2011 at 12:56 PM, Harsh J wrote:

    Aquil,

    On a side-note, if you use Tool, GenericOptsParser is automatically
    used internally (by ToolRunner), so you don't have to re-parse your
    args in your run(…) method. What you get as run(args) are the remnant
    args alone, if your application handles any.
    [AA] Thanks for clearing that up!
    Would help, as John pointed out, if you could give your exact,
    invoking CLI command.
    [AA] I am currently invoking my application as follows:

    hadoop jar /home/test/hadoop/test.option.demo.jar
    test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar


    On Mon, Aug 1, 2011 at 9:41 PM, Aquil H. Abdullah
    wrote:
    Hello All,

    I am new to Hadoop, and I am trying to use the GenericOptionsParser Class.
    In particular, I would like to use the -libjar option to specify
    additional
    jar files to include in the classpath. I've created a class that extends
    Configured and Implements Tool:

    *public class* OptionDemo *extends* Configured *implements* Tool

    {

    ...

    * public int* run(String[] args) *throws* Exception

    {

    Configuration conf = getConf();

    GenericOptionsParser opts = *new* GenericOptionsParser(conf, args);
    ...

    }

    }


    However, when I run my code the jar files that I include after -libjar
    aren't being added to the classpath and I receive an error that certain
    classes can't be found during the execution of my job.

    The book Hadoop: The Definitive Guide states:

    You don’t usually use GenericOptionsParser directly, as it’s more
    convenient
    to implement the Tool interface and run your application with the
    ToolRunner, which uses GenericOptionsParser internally:
    public interface Tool extends Configurable {
    int run(String [] args) throws Exception;
    }

    but it still isn't clear to me how the -libjars option is parsed, whether or
    not I need to explicitly add it to the classpath inside my run method, or
    where it needs to be placed in the command-line? Any advice or sample code
    on using -libjar would greatly be appreciated.

    --
    Aquil H. Abdullah
    aquil.abdullah@gmail.com


    --
    Harsh J


    --
    Aquil H. Abdullah
    aquil.abdullah@gmail.com
  • John Armstrong at Aug 1, 2011 at 6:18 pm

    On Mon, 1 Aug 2011 13:21:27 -0400, "Aquil H. Abdullah" wrote:
    [AA] I am currently invoking my application as follows:

    hadoop jar /home/test/hadoop/test.option.demo.jar
    test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar
    I believe the problem might be that it's looking for "-libjars", not
    "-libjar".
  • Aquil H. Abdullah at Aug 1, 2011 at 7:31 pm
    Don't I feel sheepish...

    OK, so I've hacked this sample code below, from the ConfigurationPrinter
    example in Hadoop: The Definitive Guide. If -libjars had been added to the
    configuration I would expect to see it when I iterate over the urls, however
    I see it as one of the remaining options:

    ***OUTPUT***
    remaining args -libjars
    remaining args C:\Apps\mahout-distribution-0.5\mahout-core-0.5.jar
    ***
    [Source Code]
    package test.option.demo;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.util.*;
    // import java.util.*;
    import java.net.URL;
    // import java.util.Map.Entry;
    public class OptionDemo extends Configured implements Tool{
    static
    {
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
    }

    @Override
    public int run(String[] args) throws Exception
    {
    GenericOptionsParser opt = new GenericOptionsParser(args);
    Configuration conf = opt.getConfiguration();
    // for (Entry<String, String> entry: conf)
    // {
    // System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
    // }

    for (int i = 0; i < args.length;i++)
    {
    System.out.printf("remaining args %s\n", args[i]);
    }

    URL[] urls = GenericOptionsParser.getLibJars(conf);

    if (urls != null)
    {
    for (int j = 0; j < urls.length;j++)
    {
    System.out.printf("url[%d] %s", j, urls[j].toString());
    }else
    System.out.println("No libraries added to configuration");
    }
    }

    return 0;
    }

    public static void main(String[] args) throws Exception
    {
    int exitCode = ToolRunner.run(new OptionDemo(), args);
    System.exit(exitCode);
    }
    }


    On Mon, Aug 1, 2011 at 2:17 PM, John Armstrong wrote:

    On Mon, 1 Aug 2011 13:21:27 -0400, "Aquil H. Abdullah"
    wrote:
    [AA] I am currently invoking my application as follows:

    hadoop jar /home/test/hadoop/test.option.demo.jar
    test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar
    I believe the problem might be that it's looking for "-libjars", not
    "-libjar".


    --
    Aquil H. Abdullah
    aquil.abdullah@gmail.com
  • John Armstrong at Aug 1, 2011 at 7:50 pm

    On Mon, 1 Aug 2011 15:30:49 -0400, "Aquil H. Abdullah" wrote:
    Don't I feel sheepish...
    Happens to the best, or so they tell me.
    OK, so I've hacked this sample code below, from the ConfigurationPrinter
    example in Hadoop: The Definitive Guide. If -libjars had been added to the
    configuration I would expect to see it when I iterate over the urls,
    however
    I see it as one of the remaining options:
    It might help you to read over the source code of the ToolRunner class. I
    know it did for me.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 1, '11 at 4:11p
activeAug 1, '11 at 7:50p
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase