FAQ
Hi,
I am working on a open source project
Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
i am trying to create the hadoop jobs depending upon the user input. I was
using Java Process API to run the bin/hadoop shell script to submit the
jobs. But it seems not good way because the process creation model is
not consistent across different operating systems . Is there any better way
to submit the jobs rather than invoking the shell script? I am using
hadoop-0.21.0 version and i am running my program in the same user where
hadoop is installed . Some of the older thread told if I add configuration
files in path it will work fine . But i am not able to run in that way . So
anyone tried this before? If So , please can you give detailed instruction
how to achieve it . Advanced thanks for your help.

Regards,
Madhukara Phatak

Search Discussions

  • Harsh J at Jul 26, 2011 at 10:05 am
    A simple job.submit(…) OR JobClient.runJob(jobConf), submits your job
    right from the Java API. Does this not work for you? If not, what
    error do you face?

    Forking out and launching from a system process is a bad idea unless
    there's absolutely no way.
    On Tue, Jul 26, 2011 at 3:28 PM, madhu phatak wrote:
    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I was
    using Java Process API to run the bin/hadoop shell script to submit the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any better way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user where
    hadoop is installed . Some of the older thread told if I add configuration
    files in path it will work fine . But i am not able to run in that way . So
    anyone tried this before? If So , please can you give detailed instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J
  • Devaraj K at Jul 26, 2011 at 10:18 am
    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and configuration
    files in the class path of the application from where you want to submit the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I was
    using Java Process API to run the bin/hadoop shell script to submit the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any better way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user where
    hadoop is installed . Some of the older thread told if I add configuration
    files in path it will work fine . But i am not able to run in that way . So
    anyone tried this before? If So , please can you give detailed instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak
  • Madhu phatak at Jul 26, 2011 at 10:36 am
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop , it just
    shows cleaning up staging area and fails.
    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K wrote:

    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and configuration
    files in the class path of the application from where you want to submit
    the
    job.

    You can refer this docs for more info on Job API's.

    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I was
    using Java Process API to run the bin/hadoop shell script to submit the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any better way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user where
    hadoop is installed . Some of the older thread told if I add configuration
    files in path it will work fine . But i am not able to run in that way . So
    anyone tried this before? If So , please can you give detailed instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak
  • Harsh J at Jul 26, 2011 at 10:39 am
    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?
    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop , it just
    shows cleaning up staging area and fails.
    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K wrote:

    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and configuration
    files in the class path of the application from where you want to submit
    the
    job.

    You can refer this docs for more info on Job API's.

    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I was
    using Java Process API to run the bin/hadoop shell script to submit the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any better way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user where
    hadoop is installed . Some of the older thread told if I add configuration
    files in path it will work fine . But i am not able to run in that way . So
    anyone tried this before? If So , please can you give detailed instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J
  • Madhu phatak at Jul 26, 2011 at 11:03 am
    I am using JobControl.add() to add a job and running job control in
    a separate thread and using JobControl.allFinished() to see all jobs
    completed or not . Is this work same as Job.submit()??
    On Tue, Jul 26, 2011 at 4:08 PM, Harsh J wrote:

    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?
    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop , it just
    shows cleaning up staging area and fails.
    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K wrote:

    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and configuration
    files in the class path of the application from where you want to submit
    the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I
    was
    using Java Process API to run the bin/hadoop shell script to submit the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any better
    way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user where
    hadoop is installed . Some of the older thread told if I add
    configuration
    files in path it will work fine . But i am not able to run in that way .
    So
    anyone tried this before? If So , please can you give detailed
    instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J
  • Harsh J at Jul 26, 2011 at 11:32 am
    Yes. Internally, it calls regular submit APIs.
    On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak wrote:
    I am using JobControl.add() to add a job and running job control in
    a separate thread and using JobControl.allFinished() to see all jobs
    completed or not . Is this work same as Job.submit()??
    On Tue, Jul 26, 2011 at 4:08 PM, Harsh J wrote:

    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?

    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop , it just
    shows cleaning up staging area and fails.
    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K wrote:

    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and configuration
    files in the class path of the application from where you want to submit
    the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I
    was
    using Java Process API to run the bin/hadoop shell script to submit the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any better
    way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user where
    hadoop is installed . Some of the older thread told if I add
    configuration
    files in path it will work fine . But i am not able to run in that way .
    So
    anyone tried this before? If So , please can you give detailed
    instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J


    --
    Harsh J
  • Devaraj K at Jul 26, 2011 at 11:43 am
    Madhu,

    Can you check the client logs, whether any error/exception is coming while
    submitting the job?

    Devaraj K

    -----Original Message-----
    From: Harsh J
    Sent: Tuesday, July 26, 2011 5:01 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Submitting and running hadoop jobs Programmatically

    Yes. Internally, it calls regular submit APIs.
    On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak wrote:
    I am using JobControl.add() to add a job and running job control in
    a separate thread and using JobControl.allFinished() to see all jobs
    completed or not . Is this work same as Job.submit()??
    On Tue, Jul 26, 2011 at 4:08 PM, Harsh J wrote:

    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?

    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop , it just
    shows cleaning up staging area and fails.
    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K wrote:

    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from
    any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and
    configuration
    files in the class path of the application from where you want to
    submit
    the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input. I
    was
    using Java Process API to run the bin/hadoop shell script to submit
    the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any
    better
    way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user
    where
    hadoop is installed . Some of the older thread told if I add
    configuration
    files in path it will work fine . But i am not able to run in that way
    .
    So
    anyone tried this before? If So , please can you give detailed
    instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J


    --
    Harsh J
  • Madhu phatak at Jul 27, 2011 at 4:55 am
    Hi
    I am submitting the job as follows

    java -cp
    Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
    com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv
    kkk11fffrrw 1

    I get the log in CLI as below

    11/07/27 10:22:54 INFO security.Groups: Group mapping
    impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
    cacheTimeout=300000
    11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId=
    11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
    processName=JobTracker, sessionId= - already initialized
    11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area
    file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001

    It doesn't create any job in hadoop.
    On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K wrote:

    Madhu,

    Can you check the client logs, whether any error/exception is coming while
    submitting the job?

    Devaraj K

    -----Original Message-----
    From: Harsh J
    Sent: Tuesday, July 26, 2011 5:01 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Submitting and running hadoop jobs Programmatically

    Yes. Internally, it calls regular submit APIs.
    On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak wrote:
    I am using JobControl.add() to add a job and running job control in
    a separate thread and using JobControl.allFinished() to see all jobs
    completed or not . Is this work same as Job.submit()??
    On Tue, Jul 26, 2011 at 4:08 PM, Harsh J wrote:

    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?

    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop ,
    it
    just
    shows cleaning up staging area and fails.
    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K wrote:

    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from
    any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and
    configuration
    files in the class path of the application from where you want to
    submit
    the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input.
    I
    was
    using Java Process API to run the bin/hadoop shell script to submit
    the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any
    better
    way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user
    where
    hadoop is installed . Some of the older thread told if I add
    configuration
    files in path it will work fine . But i am not able to run in that
    way
    .
    So
    anyone tried this before? If So , please can you give detailed
    instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J


    --
    Harsh J
  • Harsh J at Jul 27, 2011 at 6:12 am
    Madhu,

    Ditch the '*' in the classpath element that has the configuration
    directory. The directory ought to be on the classpath, not the files
    AFAIK.

    Try and let us know if it then picks up the proper config (right now,
    its using the local mode).
    On Wed, Jul 27, 2011 at 10:25 AM, madhu phatak wrote:
    Hi
    I am submitting the job as follows

    java -cp
    Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
    com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv
    kkk11fffrrw 1

    I get the log in CLI as below

    11/07/27 10:22:54 INFO security.Groups: Group mapping
    impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
    cacheTimeout=300000
    11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId=
    11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
    processName=JobTracker, sessionId= - already initialized
    11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area
    file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001

    It doesn't create any job in hadoop.
    On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K wrote:

    Madhu,

    Can you check the client logs, whether any error/exception is coming while
    submitting the job?

    Devaraj K

    -----Original Message-----
    From: Harsh J
    Sent: Tuesday, July 26, 2011 5:01 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Submitting and running hadoop jobs Programmatically

    Yes. Internally, it calls regular submit APIs.

    On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    I am using JobControl.add() to add a job and running job control in
    a separate thread and using JobControl.allFinished() to see all jobs
    completed or not . Is this work same as Job.submit()??
    On Tue, Jul 26, 2011 at 4:08 PM, Harsh J wrote:

    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?

    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop ,
    it
    just
    shows cleaning up staging area and fails.

    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K <devaraj.k@huawei.com>
    wrote:
    Hi Madhu,

    You can submit the jobs using the Job API's programmatically from
    any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and
    configuration
    files in the class path of the application from where you want to
    submit
    the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user input.
    I
    was
    using Java Process API to run the bin/hadoop shell script to submit
    the
    jobs. But it seems not good way because the process creation model is
    not consistent across different operating systems . Is there any
    better
    way
    to submit the jobs rather than invoking the shell script? I am using
    hadoop-0.21.0 version and i am running my program in the same user
    where
    hadoop is installed . Some of the older thread told if I add
    configuration
    files in path it will work fine . But i am not able to run in that
    way
    .
    So
    anyone tried this before? If So , please can you give detailed
    instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J


    --
    Harsh J


    --
    Harsh J
  • Madhu phatak at Jul 28, 2011 at 5:32 am
    Thank you Harsha . I am able to run the jobs by ditching *.
    On Wed, Jul 27, 2011 at 11:41 AM, Harsh J wrote:

    Madhu,

    Ditch the '*' in the classpath element that has the configuration
    directory. The directory ought to be on the classpath, not the files
    AFAIK.

    Try and let us know if it then picks up the proper config (right now,
    its using the local mode).
    On Wed, Jul 27, 2011 at 10:25 AM, madhu phatak wrote:
    Hi
    I am submitting the job as follows

    java -cp
    Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
    com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob
    input/book.csv
    kkk11fffrrw 1

    I get the log in CLI as below

    11/07/27 10:22:54 INFO security.Groups: Group mapping
    impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
    cacheTimeout=300000
    11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
    processName=JobTracker, sessionId=
    11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
    processName=JobTracker, sessionId= - already initialized
    11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area

    file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001
    It doesn't create any job in hadoop.
    On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K wrote:

    Madhu,

    Can you check the client logs, whether any error/exception is coming
    while
    submitting the job?

    Devaraj K

    -----Original Message-----
    From: Harsh J
    Sent: Tuesday, July 26, 2011 5:01 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Submitting and running hadoop jobs Programmatically

    Yes. Internally, it calls regular submit APIs.

    On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    I am using JobControl.add() to add a job and running job control in
    a separate thread and using JobControl.allFinished() to see all jobs
    completed or not . Is this work same as Job.submit()??
    On Tue, Jul 26, 2011 at 4:08 PM, Harsh J wrote:

    Madhu,

    Do you get a specific error message / stack trace? Could you also
    paste your JT logs?

    On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak <phatak.dev@gmail.com>
    wrote:
    Hi
    I am using the same APIs but i am not able to run the jobs by just adding
    the configuration files and jars . It never create a job in Hadoop
    ,
    it
    just
    shows cleaning up staging area and fails.

    On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K <devaraj.k@huawei.com>
    wrote:
    Hi Madhu,

    You can submit the jobs using the Job API's programmatically
    from
    any
    system. The job submission code can be written this way.

    // Create a new Job
    Job job = new Job(new Configuration());
    job.setJarByClass(MyJob.class);

    // Specify various job-specific parameters
    job.setJobName("myjob");

    job.setInputPath(new Path("in"));
    job.setOutputPath(new Path("out"));

    job.setMapperClass(MyJob.MyMapper.class);
    job.setReducerClass(MyJob.MyReducer.class);

    // Submit the job
    job.submit();



    For submitting this, need to add the hadoop jar files and
    configuration
    files in the class path of the application from where you want to
    submit
    the
    job.

    You can refer this docs for more info on Job API's.
    http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
    uce/Job.html



    Devaraj K

    -----Original Message-----
    From: madhu phatak
    Sent: Tuesday, July 26, 2011 3:29 PM
    To: common-user@hadoop.apache.org
    Subject: Submitting and running hadoop jobs Programmatically

    Hi,
    I am working on a open source project
    Nectar<https://github.com/zinnia-phatak-dev/Nectar> where
    i am trying to create the hadoop jobs depending upon the user
    input.
    I
    was
    using Java Process API to run the bin/hadoop shell script to
    submit
    the
    jobs. But it seems not good way because the process creation model
    is
    not consistent across different operating systems . Is there any
    better
    way
    to submit the jobs rather than invoking the shell script? I am
    using
    hadoop-0.21.0 version and i am running my program in the same user
    where
    hadoop is installed . Some of the older thread told if I add
    configuration
    files in path it will work fine . But i am not able to run in that
    way
    .
    So
    anyone tried this before? If So , please can you give detailed
    instruction
    how to achieve it . Advanced thanks for your help.

    Regards,
    Madhukara Phatak


    --
    Harsh J


    --
    Harsh J


    --
    Harsh J
  • Steve Loughran at Jul 27, 2011 at 10:15 am

    On 27/07/11 05:55, madhu phatak wrote:
    Hi
    I am submitting the job as follows

    java -cp
    Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
    com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv
    kkk11fffrrw 1
    My code to submit jobs (via a declarative configuration) is up online

    http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590&view=markup

    It's LGPL, but ask nicely and I'll change the header to Apache.

    That code doesn't set up the classpath by pushing out more JARs (I'm
    planning to push out .groovy scripts instead), but it can also poll for
    job completion, take a timeout (useful in small test runs), and do other
    things. I currently mainly use it for testing
  • Madhu phatak at Jul 27, 2011 at 10:34 am
    Thank you . Will have a look on it.
    On Wed, Jul 27, 2011 at 3:28 PM, Steve Loughran wrote:
    On 27/07/11 05:55, madhu phatak wrote:

    Hi
    I am submitting the job as follows

    java -cp
    Nectar-analytics-0.0.1-**SNAPSHOT.jar:/home/hadoop/**
    hadoop-for-nectar/hadoop-0.21.**0/conf/*:$HADOOP_COMMON_HOME/**
    lib/*:$HADOOP_COMMON_HOME/*
    com.zinnia.nectar.regression.**hadoop.primitive.jobs.SigmaJob
    input/book.csv
    kkk11fffrrw 1
    My code to submit jobs (via a declarative configuration) is up online

    http://smartfrog.svn.**sourceforge.net/viewvc/**
    smartfrog/trunk/core/hadoop-**components/hadoop-ops/src/org/**
    smartfrog/services/hadoop/**operations/components/**
    submitter/SubmitterImpl.java?**revision=8590&view=markup<http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590&view=markup>

    It's LGPL, but ask nicely and I'll change the header to Apache.

    That code doesn't set up the classpath by pushing out more JARs (I'm
    planning to push out .groovy scripts instead), but it can also poll for job
    completion, take a timeout (useful in small test runs), and do other things.
    I currently mainly use it for testing

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 26, '11 at 9:59a
activeJul 28, '11 at 5:32a
posts13
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase