FAQ
Dear all,

I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

Now this time I want a map-reduce job to be run again after one.

Fore.g to clear my point, suppose a wordcount is run on gutenberg file
in HDFS and after completion

11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set. User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
process : 3
11/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%
11/06/02 15:14:59 INFO mapred.JobClient: map 66% reduce 11%
11/06/02 15:15:08 INFO mapred.JobClient: map 100% reduce 22%
11/06/02 15:15:17 INFO mapred.JobClient: map 100% reduce 100%
11/06/02 15:15:25 INFO mapred.JobClient: Job complete: job_201106021143_0030
11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



Again a map-reduce job is started on the output or original data say again

1/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%

Is it possible or any parameters to achieve it.

Please guide .

Thanks

Search Discussions

  • Harsh J at Jun 2, 2011 at 11:23 am
    Oozie's workflow feature may exactly be what you're looking for. It
    can also do much more than just chain jobs.

    Check out additional features at: http://yahoo.github.com/oozie/
    On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma wrote:
    Dear all,

    I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

    Now this time I want a map-reduce job to be run again after one.

    Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
    HDFS and after completion

    11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User classes
    may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to process
    : 3
    11/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
    11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
    11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
    11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
    11/06/02 15:15:25 INFO mapred.JobClient: Job complete: job_201106021143_0030
    11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



    Again a map-reduce job is started on the output or original data say again

    1/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_003
    11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%

    Is it possible or any parameters to achieve it.

    Please guide .

    Thanks


    --
    Harsh J
  • Adarsh Sharma at Jun 2, 2011 at 11:35 am
    Ok, Is it valid for running jobs through Hadoop Pipes too.

    Thanks

    Harsh J wrote:
    Oozie's workflow feature may exactly be what you're looking for. It
    can also do much more than just chain jobs.

    Check out additional features at: http://yahoo.github.com/oozie/
    On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma wrote:

    Dear all,

    I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

    Now this time I want a map-reduce job to be run again after one.

    Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
    HDFS and after completion

    11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set. User classes
    may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to process
    : 3
    11/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%
    11/06/02 15:14:59 INFO mapred.JobClient: map 66% reduce 11%
    11/06/02 15:15:08 INFO mapred.JobClient: map 100% reduce 22%
    11/06/02 15:15:17 INFO mapred.JobClient: map 100% reduce 100%
    11/06/02 15:15:25 INFO mapred.JobClient: Job complete: job_201106021143_0030
    11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



    Again a map-reduce job is started on the output or original data say again

    1/06/02 15:14:36 INFO mapred.JobClient: Running job: job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%

    Is it possible or any parameters to achieve it.

    Please guide .

    Thanks


  • Harsh J at Jun 2, 2011 at 11:46 am
    Yes, I believe Oozie does have Pipes and Streaming action helpers as well.
    On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma wrote:
    Ok, Is it valid for running jobs through Hadoop Pipes too.

    Thanks

    Harsh J wrote:
    Oozie's workflow feature may exactly be what you're looking for. It
    can also do much more than just chain jobs.

    Check out additional features at: http://yahoo.github.com/oozie/

    On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma <adarsh.sharma@orkash.com>
    wrote:
    Dear all,

    I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

    Now this time I want a map-reduce job to be run again after one.

    Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
    HDFS and after completion

    11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User
    classes
    may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
    process
    : 3
    11/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
    11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
    11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
    11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
    11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
    job_201106021143_0030
    11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



    Again a map-reduce job is started on the output or original data say
    again

    1/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%

    Is it possible or any parameters to achieve it.

    Please guide .

    Thanks




    --
    Harsh J
  • Adarsh Sharma at Jun 2, 2011 at 11:47 am
    Thanks a lot, I will let you know after some work on it.

    :-)

    Harsh J wrote:
    Yes, I believe Oozie does have Pipes and Streaming action helpers as well.
    On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma wrote:

    Ok, Is it valid for running jobs through Hadoop Pipes too.

    Thanks

    Harsh J wrote:
    Oozie's workflow feature may exactly be what you're looking for. It
    can also do much more than just chain jobs.

    Check out additional features at: http://yahoo.github.com/oozie/

    On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma <adarsh.sharma@orkash.com>
    wrote:

    Dear all,

    I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

    Now this time I want a map-reduce job to be run again after one.

    Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
    HDFS and after completion

    11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set. User
    classes
    may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
    process
    : 3
    11/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%
    11/06/02 15:14:59 INFO mapred.JobClient: map 66% reduce 11%
    11/06/02 15:15:08 INFO mapred.JobClient: map 100% reduce 22%
    11/06/02 15:15:17 INFO mapred.JobClient: map 100% reduce 100%
    11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
    job_201106021143_0030
    11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



    Again a map-reduce job is started on the output or original data say
    again

    1/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%

    Is it possible or any parameters to achieve it.

    Please guide .

    Thanks



  • Adarsh Sharma at Jun 7, 2011 at 10:45 am

    Harsh J wrote:
    Yes, I believe Oozie does have Pipes and Streaming action helpers as well.
    On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma wrote:

    Ok, Is it valid for running jobs through Hadoop Pipes too.

    Thanks

    Harsh J wrote:
    Oozie's workflow feature may exactly be what you're looking for. It
    can also do much more than just chain jobs.

    Check out additional features at: http://yahoo.github.com/oozie/

    On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma <adarsh.sharma@orkash.com>
    wrote:
    After following the below points, I am confused about the examples used
    in the documentation :

    http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a3.2.2.3_Pipes

    What I want to achieve is to terminate a job after my permission i.e if
    I want to run again a map-reduce job after the completion of one , it
    runs & then terminates after my code execution.
    I struggled to find a simple example that proves this concept. In the
    Oozie documentation, they r just setting parameters and use them.

    fore.g a simple Hadoop Pipes job is executed by :

    int main(int argc, char *argv[]) {
    return HadoopPipes::runTask(HadoopPipes::TemplateFactory<WordCountMap,
    WordCountReduce>());
    }

    Now if I want to run another job after this on the reduced data in HDFS,
    how this could be possible. Do i need to add some code.

    Thanks



    Dear all,

    I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

    Now this time I want a map-reduce job to be run again after one.

    Fore.g to clear my point, suppose a wordcount is run on gutenberg file in
    HDFS and after completion

    11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set. User
    classes
    may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
    process
    : 3
    11/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%
    11/06/02 15:14:59 INFO mapred.JobClient: map 66% reduce 11%
    11/06/02 15:15:08 INFO mapred.JobClient: map 100% reduce 22%
    11/06/02 15:15:17 INFO mapred.JobClient: map 100% reduce 100%
    11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
    job_201106021143_0030
    11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



    Again a map-reduce job is started on the output or original data say
    again

    1/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%

    Is it possible or any parameters to achieve it.

    Please guide .

    Thanks



  • Madhu phatak at Jun 21, 2011 at 11:15 am
    You can use ControlledJob's addDependingJob to handle dependency between
    multiple jobs.
    On Tue, Jun 7, 2011 at 4:15 PM, Adarsh Sharma wrote:

    Harsh J wrote:
    Yes, I believe Oozie does have Pipes and Streaming action helpers as well.

    On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com>
    wrote:

    Ok, Is it valid for running jobs through Hadoop Pipes too.

    Thanks

    Harsh J wrote:

    Oozie's workflow feature may exactly be what you're looking for. It
    can also do much more than just chain jobs.

    Check out additional features at: http://yahoo.github.com/oozie/

    On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma <adarsh.sharma@orkash.com
    wrote:

    After following the below points, I am confused about the examples used
    in the documentation :

    http://yahoo.github.com/oozie/**releases/3.0.0/**
    WorkflowFunctionalSpec.html#**a3.2.2.3_Pipes<http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a3.2.2.3_Pipes>

    What I want to achieve is to terminate a job after my permission i.e if I
    want to run again a map-reduce job after the completion of one , it runs &
    then terminates after my code execution.
    I struggled to find a simple example that proves this concept. In the Oozie
    documentation, they r just setting parameters and use them.

    fore.g a simple Hadoop Pipes job is executed by :

    int main(int argc, char *argv[]) {
    return HadoopPipes::runTask(**HadoopPipes::TemplateFactory<**
    WordCountMap,
    WordCountReduce>());
    }

    Now if I want to run another job after this on the reduced data in HDFS,
    how this could be possible. Do i need to add some code.

    Thanks





    Dear all,
    I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.

    Now this time I want a map-reduce job to be run again after one.

    Fore.g to clear my point, suppose a wordcount is run on gutenberg file
    in
    HDFS and after completion

    11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set. User
    classes
    may not be found. See JobConf(Class) or JobConf#setJar(String).
    11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
    process
    : 3
    11/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%
    11/06/02 15:14:59 INFO mapred.JobClient: map 66% reduce 11%
    11/06/02 15:15:08 INFO mapred.JobClient: map 100% reduce 22%
    11/06/02 15:15:17 INFO mapred.JobClient: map 100% reduce 100%
    11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
    job_201106021143_0030
    11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18



    Again a map-reduce job is started on the output or original data say
    again

    1/06/02 15:14:36 INFO mapred.JobClient: Running job:
    job_201106021143_0030
    11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0%
    11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0%

    Is it possible or any parameters to achieve it.

    Please guide .

    Thanks






Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 2, '11 at 11:18a
activeJun 21, '11 at 11:15a
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase