FAQ
Hey all,

I'm trying to get Hadoop up and running as a proof of concept to make an argument for moving away from a big RDBMS. I'm having some challenges just getting a really simple demo mapreduce to run. The examples I have seen on the web tend to make use of classes that are now deprecated in the latest hadoop (0.20.1). It is not clear what the equivalent newer classes are in some cases.

Anyway, I am stuck at this exception - here it is start to finish:
---------------
$ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest testdata outputdata
10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName
=JobTracker, sessionId=
10/02/18 09:24:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing th
e arguments. Applications should implement Tool for the same.
10/02/18 09:24:55 INFO input.FileInputFormat: Total input paths to process : 5
10/02/18 09:24:56 INFO input.FileInputFormat: Total input paths to process : 5
Exception in thread "Thread-13" java.lang.IllegalStateException: Shutdown in pro
gress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
39)
at java.lang.Runtime.addShutdownHook(Runtime.java:192)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
mitter.java:61)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
45)
------------

Now here is the code that actually starts things up (not including the actual mapreduce code). I initially suspected this code because I was guessing at the correct non-deprecated classes to use:

public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job2 = new Job(conf);
job2.setJobName("RetailTest");
job2.setJarByClass(RetailTest.class);
job2.setMapperClass(RetailMapper.class);
job2.setReducerClass(RetailReducer.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
job2.setNumReduceTasks(1);
// this was a guess on my part as I could not find out the "recommended way"
job2.setWorkingDirectory(new Path(args[0]));
FileInputFormat.setInputPaths(job2, new Path(args[0]));
FileOutputFormat.setOutputPath(job2, new Path(args[1]));
job2.submit();
return 0;
}

/**
* @param args
*/
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new RetailTest(), args);
System.exit(res);
}

Can someone sanity check me here? Much appreciated.

Regards,

Cory

Search Discussions

  • Eric Arenas at Feb 18, 2010 at 5:53 pm
    Hi Cory,

    regarding the part that you are not sure about:


    String inputdir = args[0];
    String outputdir= args[1];
    int numberReducers = Integer.parseInt(args[2]);
    //it is better to at least pass the numbers of reducers as parameters, or read from the XML job config file, if you want

    //setting the number of reducers to 1 , as you had in your code *might* potentially make it slower to process and generate the output
    //if you are trying to sell the idea of Hadoop as a new ETL tool, you want it to be as fast as you can

    ...................

    job2.setNumReduceTasks(1);
    FileInputFormat.setInputPaths(job, inputdir);
    FileOutputFormat.setOutputPath(job, new Path(outputdir));

    return job.waitForCompletion(true) ? 0 : 1;

    } //end of run method


    Unless you copy/paste your code, I do not see why you need to set "setWorkingDirectory" in your M/R job.

    Give this a try and let me know,

    regards,
    Eric Arenas



    ----- Original Message ----
    From: Cory Berg <icey502@yahoo.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, February 18, 2010 9:07:54 AM
    Subject: basic hadoop job help

    Hey all,

    I'm trying to get Hadoop up and running as a proof of concept to make an argument for moving away from a big RDBMS. I'm having some challenges just getting a really simple demo mapreduce to run. The examples I have seen on the web tend to make use of classes that are now deprecated in the latest hadoop (0.20.1). It is not clear what the equivalent newer classes are in some cases.

    Anyway, I am stuck at this exception - here it is start to finish:
    ---------------
    $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest testdata outputdata
    10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName
    =JobTracker, sessionId=
    10/02/18 09:24:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing th
    e arguments. Applications should implement Tool for the same.
    10/02/18 09:24:55 INFO input.FileInputFormat: Total input paths to process : 5
    10/02/18 09:24:56 INFO input.FileInputFormat: Total input paths to process : 5
    Exception in thread "Thread-13" java.lang.IllegalStateException: Shutdown in pro
    gress
    at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
    39)
    at java.lang.Runtime.addShutdownHook(Runtime.java:192)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
    mitter.java:61)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
    45)
    ------------

    Now here is the code that actually starts things up (not including the actual mapreduce code). I initially suspected this code because I was guessing at the correct non-deprecated classes to use:

    public int run(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job2 = new Job(conf);
    job2.setJobName("RetailTest");
    job2.setJarByClass(RetailTest.class);
    job2.setMapperClass(RetailMapper.class);
    job2.setReducerClass(RetailReducer.class);
    job2.setOutputKeyClass(Text.class);
    job2.setOutputValueClass(Text.class);
    job2.setNumReduceTasks(1);
    // this was a guess on my part as I could not find out the "recommended way"
    job2.setWorkingDirectory(new Path(args[0]));
    FileInputFormat.setInputPaths(job2, new Path(args[0]));
    FileOutputFormat.setOutputPath(job2, new Path(args[1]));
    job2.submit();
    return 0;
    }

    /**
    * @param args
    */
    public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new RetailTest(), args);
    System.exit(res);
    }

    Can someone sanity check me here? Much appreciated.

    Regards,

    Cory
  • C Berg at Feb 18, 2010 at 7:19 pm
    Hi Eric,

    Thanks for the advice, that is very much appreciated. With your help I was able to get past the mechanical part to something a bit more substantive, which is, wrapping my head around doing an actual business calculation in a mapreduce way. Any recommendations on some tutorials that cover some real-world examples other than word counting and the like?

    Thanks again,

    Cory

    --- On Thu, 2/18/10, Eric Arenas wrote:
    From: Eric Arenas <earenas@rocketmail.com>
    Subject: Re: basic hadoop job help
    To: common-user@hadoop.apache.org
    Date: Thursday, February 18, 2010, 10:52 AM
    Hi Cory,

    regarding the part that you are not sure about:


    String inputdir  = args[0];
    String outputdir= args[1];
    int numberReducers = Integer.parseInt(args[2]);
    //it is better to at least pass the numbers of reducers as
    parameters, or read from the XML job config file, if you
    want

    //setting the number of reducers to 1 , as you had in your
    code *might* potentially make it slower to process and
    generate the output
    //if you are trying to sell the idea of Hadoop as a new ETL
    tool, you want it to be as fast as you can

    ...................

    job2.setNumReduceTasks(1);
    FileInputFormat.setInputPaths(job, inputdir);
    FileOutputFormat.setOutputPath(job, new Path(outputdir));

    return job.waitForCompletion(true) ? 0 : 1;

    } //end of run method


    Unless you copy/paste your code, I do not see why you need
    to set "setWorkingDirectory" in your M/R job.

    Give this a try and let me know,

    regards,
    Eric Arenas



    ----- Original Message ----
    From: Cory Berg <icey502@yahoo.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, February 18, 2010 9:07:54 AM
    Subject: basic hadoop job help

    Hey all,

    I'm trying to get Hadoop up and running as a proof of
    concept to make an argument for moving away from a big
    RDBMS.  I'm having some challenges just getting a
    really simple demo mapreduce to run.  The examples I
    have seen on the web tend to make use of classes that are
    now deprecated in the latest hadoop (0.20.1).  It is
    not clear what the equivalent newer classes are in some
    cases.

    Anyway, I am stuck at this exception - here it is start to
    finish:
    ---------------
    $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest
    testdata outputdata
    10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM
    Metrics with processName
    =JobTracker, sessionId=
    10/02/18 09:24:55 WARN mapred.JobClient: Use
    GenericOptionsParser for parsing th
    e arguments. Applications should implement Tool for the
    same.
    10/02/18 09:24:55 INFO input.FileInputFormat: Total input
    paths to process : 5
    10/02/18 09:24:56 INFO input.FileInputFormat: Total input
    paths to process : 5
    Exception in thread "Thread-13"
    java.lang.IllegalStateException: Shutdown in pro
    gress
    at
    java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
    39)
    at
    java.lang.Runtime.addShutdownHook(Runtime.java:192)
    at
    org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at
    org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at
    org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
    mitter.java:61)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
    45)
    ------------

    Now here is the code that actually starts things up (not
    including the actual mapreduce code).  I initially
    suspected this code because I was guessing at the correct
    non-deprecated classes to use:

    public int run(String[] args) throws Exception {
    Configuration conf = new
    Configuration();
    Job job2 = new Job(conf);
    job2.setJobName("RetailTest");

    job2.setJarByClass(RetailTest.class);

    job2.setMapperClass(RetailMapper.class);

    job2.setReducerClass(RetailReducer.class);

    job2.setOutputKeyClass(Text.class);

    job2.setOutputValueClass(Text.class);
    job2.setNumReduceTasks(1);
    // this was a guess on my part as I could not find out the
    "recommended way"
    job2.setWorkingDirectory(new
    Path(args[0]));

    FileInputFormat.setInputPaths(job2, new Path(args[0]));

    FileOutputFormat.setOutputPath(job2, new Path(args[1]));
    job2.submit();
    return 0;
    }

    /**
    * @param args
    */
    public static void main(String[] args)
    throws Exception {
    int res = ToolRunner.run(new
    RetailTest(), args);
    System.exit(res);
    }

    Can someone sanity check me here?  Much appreciated.

    Regards,

    Cory
  • Amogh Vasekar at Feb 18, 2010 at 7:27 pm
    Hi,
    The hadoop meet last year has some very interesting business solutions discussed:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/
    Most of the companies in there have shared their methodology on their blogs / on slideshare.
    One I have handy is:
    http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
    Shows how Y! Search assist is implemented.


    Amogh


    On 2/19/10 12:48 AM, "C Berg" wrote:

    Hi Eric,

    Thanks for the advice, that is very much appreciated. With your help I was able to get past the mechanical part to something a bit more substantive, which is, wrapping my head around doing an actual business calculation in a mapreduce way. Any recommendations on some tutorials that cover some real-world examples other than word counting and the like?

    Thanks again,

    Cory

    --- On Thu, 2/18/10, Eric Arenas wrote:
    From: Eric Arenas <earenas@rocketmail.com>
    Subject: Re: basic hadoop job help
    To: common-user@hadoop.apache.org
    Date: Thursday, February 18, 2010, 10:52 AM
    Hi Cory,

    regarding the part that you are not sure about:


    String inputdir = args[0];
    String outputdir= args[1];
    int numberReducers = Integer.parseInt(args[2]);
    //it is better to at least pass the numbers of reducers as
    parameters, or read from the XML job config file, if you
    want

    //setting the number of reducers to 1 , as you had in your
    code *might* potentially make it slower to process and
    generate the output
    //if you are trying to sell the idea of Hadoop as a new ETL
    tool, you want it to be as fast as you can

    ...................

    job2.setNumReduceTasks(1);
    FileInputFormat.setInputPaths(job, inputdir);
    FileOutputFormat.setOutputPath(job, new Path(outputdir));

    return job.waitForCompletion(true) ? 0 : 1;

    } //end of run method


    Unless you copy/paste your code, I do not see why you need
    to set "setWorkingDirectory" in your M/R job.

    Give this a try and let me know,

    regards,
    Eric Arenas



    ----- Original Message ----
    From: Cory Berg <icey502@yahoo.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, February 18, 2010 9:07:54 AM
    Subject: basic hadoop job help

    Hey all,

    I'm trying to get Hadoop up and running as a proof of
    concept to make an argument for moving away from a big
    RDBMS. I'm having some challenges just getting a
    really simple demo mapreduce to run. The examples I
    have seen on the web tend to make use of classes that are
    now deprecated in the latest hadoop (0.20.1). It is
    not clear what the equivalent newer classes are in some
    cases.

    Anyway, I am stuck at this exception - here it is start to
    finish:
    ---------------
    $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest
    testdata outputdata
    10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM
    Metrics with processName
    =JobTracker, sessionId=
    10/02/18 09:24:55 WARN mapred.JobClient: Use
    GenericOptionsParser for parsing th
    e arguments. Applications should implement Tool for the
    same.
    10/02/18 09:24:55 INFO input.FileInputFormat: Total input
    paths to process : 5
    10/02/18 09:24:56 INFO input.FileInputFormat: Total input
    paths to process : 5
    Exception in thread "Thread-13"
    java.lang.IllegalStateException: Shutdown in pro
    gress
    at
    java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
    39)
    at
    java.lang.Runtime.addShutdownHook(Runtime.java:192)
    at
    org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at
    org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at
    org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
    mitter.java:61)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
    45)
    ------------

    Now here is the code that actually starts things up (not
    including the actual mapreduce code). I initially
    suspected this code because I was guessing at the correct
    non-deprecated classes to use:

    public int run(String[] args) throws Exception {
    Configuration conf = new
    Configuration();
    Job job2 = new Job(conf);
    job2.setJobName("RetailTest");

    job2.setJarByClass(RetailTest.class);

    job2.setMapperClass(RetailMapper.class);

    job2.setReducerClass(RetailReducer.class);

    job2.setOutputKeyClass(Text.class);

    job2.setOutputValueClass(Text.class);
    job2.setNumReduceTasks(1);
    // this was a guess on my part as I could not find out the
    "recommended way"
    job2.setWorkingDirectory(new
    Path(args[0]));

    FileInputFormat.setInputPaths(job2, new Path(args[0]));

    FileOutputFormat.setOutputPath(job2, new Path(args[1]));
    job2.submit();
    return 0;
    }

    /**
    * @param args
    */
    public static void main(String[] args)
    throws Exception {
    int res = ToolRunner.run(new
    RetailTest(), args);
    System.exit(res);
    }

    Can someone sanity check me here? Much appreciated.

    Regards,

    Cory
  • Brian Wolf at Feb 18, 2010 at 7:34 pm
    since i'm more or less in the same boat, this is the best I've seen, and the
    2009


    book is also very good:


    http://developer.yahoo.com/hadoop/

    Brian
    On Thu, Feb 18, 2010 at 12:26 PM, Amogh Vasekar wrote:

    Hi,
    The hadoop meet last year has some very interesting business solutions
    discussed:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/
    Most of the companies in there have shared their methodology on their blogs
    / on slideshare.
    One I have handy is:

    http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
    Shows how Y! Search assist is implemented.


    Amogh


    On 2/19/10 12:48 AM, "C Berg" wrote:

    Hi Eric,

    Thanks for the advice, that is very much appreciated. With your help I was
    able to get past the mechanical part to something a bit more substantive,
    which is, wrapping my head around doing an actual business calculation in a
    mapreduce way. Any recommendations on some tutorials that cover some
    real-world examples other than word counting and the like?

    Thanks again,

    Cory

    --- On Thu, 2/18/10, Eric Arenas wrote:
    From: Eric Arenas <earenas@rocketmail.com>
    Subject: Re: basic hadoop job help
    To: common-user@hadoop.apache.org
    Date: Thursday, February 18, 2010, 10:52 AM
    Hi Cory,

    regarding the part that you are not sure about:


    String inputdir = args[0];
    String outputdir= args[1];
    int numberReducers = Integer.parseInt(args[2]);
    //it is better to at least pass the numbers of reducers as
    parameters, or read from the XML job config file, if you
    want

    //setting the number of reducers to 1 , as you had in your
    code *might* potentially make it slower to process and
    generate the output
    //if you are trying to sell the idea of Hadoop as a new ETL
    tool, you want it to be as fast as you can

    ...................

    job2.setNumReduceTasks(1);
    FileInputFormat.setInputPaths(job, inputdir);
    FileOutputFormat.setOutputPath(job, new Path(outputdir));

    return job.waitForCompletion(true) ? 0 : 1;

    } //end of run method


    Unless you copy/paste your code, I do not see why you need
    to set "setWorkingDirectory" in your M/R job.

    Give this a try and let me know,

    regards,
    Eric Arenas



    ----- Original Message ----
    From: Cory Berg <icey502@yahoo.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, February 18, 2010 9:07:54 AM
    Subject: basic hadoop job help

    Hey all,

    I'm trying to get Hadoop up and running as a proof of
    concept to make an argument for moving away from a big
    RDBMS. I'm having some challenges just getting a
    really simple demo mapreduce to run. The examples I
    have seen on the web tend to make use of classes that are
    now deprecated in the latest hadoop (0.20.1). It is
    not clear what the equivalent newer classes are in some
    cases.

    Anyway, I am stuck at this exception - here it is start to
    finish:
    ---------------
    $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest
    testdata outputdata
    10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM
    Metrics with processName
    =JobTracker, sessionId=
    10/02/18 09:24:55 WARN mapred.JobClient: Use
    GenericOptionsParser for parsing th
    e arguments. Applications should implement Tool for the
    same.
    10/02/18 09:24:55 INFO input.FileInputFormat: Total input
    paths to process : 5
    10/02/18 09:24:56 INFO input.FileInputFormat: Total input
    paths to process : 5
    Exception in thread "Thread-13"
    java.lang.IllegalStateException: Shutdown in pro
    gress
    at
    java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
    39)
    at
    java.lang.Runtime.addShutdownHook(Runtime.java:192)
    at
    org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at
    org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at
    org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
    mitter.java:61)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
    45)
    ------------

    Now here is the code that actually starts things up (not
    including the actual mapreduce code). I initially
    suspected this code because I was guessing at the correct
    non-deprecated classes to use:

    public int run(String[] args) throws Exception {
    Configuration conf = new
    Configuration();
    Job job2 = new Job(conf);
    job2.setJobName("RetailTest");

    job2.setJarByClass(RetailTest.class);

    job2.setMapperClass(RetailMapper.class);

    job2.setReducerClass(RetailReducer.class);

    job2.setOutputKeyClass(Text.class);

    job2.setOutputValueClass(Text.class);
    job2.setNumReduceTasks(1);
    // this was a guess on my part as I could not find out the
    "recommended way"
    job2.setWorkingDirectory(new
    Path(args[0]));

    FileInputFormat.setInputPaths(job2, new Path(args[0]));

    FileOutputFormat.setOutputPath(job2, new Path(args[1]));
    job2.submit();
    return 0;
    }

    /**
    * @param args
    */
    public static void main(String[] args)
    throws Exception {
    int res = ToolRunner.run(new
    RetailTest(), args);
    System.exit(res);
    }

    Can someone sanity check me here? Much appreciated.

    Regards,

    Cory


  • Praveen Yarlagadda at Feb 18, 2010 at 11:41 pm
    I recommend the following book by Tom White:

    Hadoop: The Definite Guide

    It will give you more details about Hadoop.

    Regards,
    Praveen
    On Thu, Feb 18, 2010 at 11:34 AM, Brian Wolf wrote:

    since i'm more or less in the same boat, this is the best I've seen, and
    the
    2009


    book is also very good:


    http://developer.yahoo.com/hadoop/

    Brian
    On Thu, Feb 18, 2010 at 12:26 PM, Amogh Vasekar wrote:

    Hi,
    The hadoop meet last year has some very interesting business solutions
    discussed:
    http://www.cloudera.com/company/press-center/hadoop-world-nyc/
    Most of the companies in there have shared their methodology on their blogs
    / on slideshare.
    One I have handy is:

    http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig
    Shows how Y! Search assist is implemented.


    Amogh


    On 2/19/10 12:48 AM, "C Berg" wrote:

    Hi Eric,

    Thanks for the advice, that is very much appreciated. With your help I was
    able to get past the mechanical part to something a bit more substantive,
    which is, wrapping my head around doing an actual business calculation in a
    mapreduce way. Any recommendations on some tutorials that cover some
    real-world examples other than word counting and the like?

    Thanks again,

    Cory

    --- On Thu, 2/18/10, Eric Arenas wrote:
    From: Eric Arenas <earenas@rocketmail.com>
    Subject: Re: basic hadoop job help
    To: common-user@hadoop.apache.org
    Date: Thursday, February 18, 2010, 10:52 AM
    Hi Cory,

    regarding the part that you are not sure about:


    String inputdir = args[0];
    String outputdir= args[1];
    int numberReducers = Integer.parseInt(args[2]);
    //it is better to at least pass the numbers of reducers as
    parameters, or read from the XML job config file, if you
    want

    //setting the number of reducers to 1 , as you had in your
    code *might* potentially make it slower to process and
    generate the output
    //if you are trying to sell the idea of Hadoop as a new ETL
    tool, you want it to be as fast as you can

    ...................

    job2.setNumReduceTasks(1);
    FileInputFormat.setInputPaths(job, inputdir);
    FileOutputFormat.setOutputPath(job, new Path(outputdir));

    return job.waitForCompletion(true) ? 0 : 1;

    } //end of run method


    Unless you copy/paste your code, I do not see why you need
    to set "setWorkingDirectory" in your M/R job.

    Give this a try and let me know,

    regards,
    Eric Arenas



    ----- Original Message ----
    From: Cory Berg <icey502@yahoo.com>
    To: common-user@hadoop.apache.org
    Sent: Thu, February 18, 2010 9:07:54 AM
    Subject: basic hadoop job help

    Hey all,

    I'm trying to get Hadoop up and running as a proof of
    concept to make an argument for moving away from a big
    RDBMS. I'm having some challenges just getting a
    really simple demo mapreduce to run. The examples I
    have seen on the web tend to make use of classes that are
    now deprecated in the latest hadoop (0.20.1). It is
    not clear what the equivalent newer classes are in some
    cases.

    Anyway, I am stuck at this exception - here it is start to
    finish:
    ---------------
    $ ./bin/hadoop jar ./testdata/RetailTest.jar RetailTest
    testdata outputdata
    10/02/18 09:24:55 INFO jvm.JvmMetrics: Initializing JVM
    Metrics with processName
    =JobTracker, sessionId=
    10/02/18 09:24:55 WARN mapred.JobClient: Use
    GenericOptionsParser for parsing th
    e arguments. Applications should implement Tool for the
    same.
    10/02/18 09:24:55 INFO input.FileInputFormat: Total input
    paths to process : 5
    10/02/18 09:24:56 INFO input.FileInputFormat: Total input
    paths to process : 5
    Exception in thread "Thread-13"
    java.lang.IllegalStateException: Shutdown in pro
    gress
    at
    java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:
    39)
    at
    java.lang.Runtime.addShutdownHook(Runtime.java:192)
    at
    org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at
    org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at
    org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at
    org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCom
    mitter.java:61)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
    45)
    ------------

    Now here is the code that actually starts things up (not
    including the actual mapreduce code). I initially
    suspected this code because I was guessing at the correct
    non-deprecated classes to use:

    public int run(String[] args) throws Exception {
    Configuration conf = new
    Configuration();
    Job job2 = new Job(conf);
    job2.setJobName("RetailTest");

    job2.setJarByClass(RetailTest.class);

    job2.setMapperClass(RetailMapper.class);

    job2.setReducerClass(RetailReducer.class);

    job2.setOutputKeyClass(Text.class);

    job2.setOutputValueClass(Text.class);
    job2.setNumReduceTasks(1);
    // this was a guess on my part as I could not find out the
    "recommended way"
    job2.setWorkingDirectory(new
    Path(args[0]));

    FileInputFormat.setInputPaths(job2, new Path(args[0]));

    FileOutputFormat.setOutputPath(job2, new Path(args[1]));
    job2.submit();
    return 0;
    }

    /**
    * @param args
    */
    public static void main(String[] args)
    throws Exception {
    int res = ToolRunner.run(new
    RetailTest(), args);
    System.exit(res);
    }

    Can someone sanity check me here? Much appreciated.

    Regards,

    Cory




    --
    Regards,
    Praveen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 18, '10 at 5:08p
activeFeb 18, '10 at 11:41p
posts6
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase