FAQ
Hello,
I'm using Hadoop 0.20.1, I submitted a job using the
org.apache.hadoop.mapreduce.Job approach e.g.

org.apache.hadoop.mapreduce.Job _job
job_.submit();

However, I would like to,from another Java program, monitor this job:

1) get map/reduce progress
2) get counters
3) status
4) and the ability to join i.e wait for completion.

say jd = "job_201003190003_1855"

org.apache.hadoop.mapreduce.JobID jid =
org.apache.hadoop.mapreduce.JobID.forName(jd);
org.apache.hadoop.mapred.JobClient jclient = new
org.apache.hadoop.mapred.JobClient();

Now, what?
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)

says, I should use
1. public RunningJob getJob(JobID jobid)

but, in my installation of hadoop 0.20.1, i do not have such a method.

2. Even if I could, runningjob.getCounters() returns the old deprecated
Counters. Is this compatible with the new mapreduce code?

Is there any org.apache.mapreduce code that given the JobId, I can monitor
and wait for completion for the job?

I can get a org.apache.hadoop.mapreduce.JobContext with the Job ID, but
there should be a way to get the Job.

One option, is to serialize the job_ variable and load it in later(not my
first choice)
Thanks

State: RUNNING
Started: Fri Mar 19 00:03:40 EDT 2010
Version: 0.20.1, r810220
Compiled: Tue Sep 1 20:55:56 UTC 2009 by oom
Identifier: 201003190003

Search Discussions

  • Saptarshi Guha at May 26, 2010 at 6:49 pm
    I found give, counters as obtained from RunningJob, I can still use it with
    the new style counters.
    But I'm still wondering how to get a RunningJob from a JobId.
    Is my Hadoop old?

    Regards
    Saptarshi

    On Wed, May 26, 2010 at 2:27 PM, Saptarshi Guha wrote:

    Hello,
    I'm using Hadoop 0.20.1, I submitted a job using the
    org.apache.hadoop.mapreduce.Job approach e.g.

    org.apache.hadoop.mapreduce.Job _job
    job_.submit();

    However, I would like to,from another Java program, monitor this job:

    1) get map/reduce progress
    2) get counters
    3) status
    4) and the ability to join i.e wait for completion.

    say jd = "job_201003190003_1855"

    org.apache.hadoop.mapreduce.JobID jid =
    org.apache.hadoop.mapreduce.JobID.forName(jd);
    org.apache.hadoop.mapred.JobClient jclient = new
    org.apache.hadoop.mapred.JobClient();

    Now, what?

    http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)

    says, I should use
    1. public RunningJob getJob(JobID jobid)

    but, in my installation of hadoop 0.20.1, i do not have such a method.

    2. Even if I could, runningjob.getCounters() returns the old deprecated
    Counters. Is this compatible with the new mapreduce code?

    Is there any org.apache.mapreduce code that given the JobId, I can monitor
    and wait for completion for the job?

    I can get a org.apache.hadoop.mapreduce.JobContext with the Job ID, but
    there should be a way to get the Job.

    One option, is to serialize the job_ variable and load it in later(not my
    first choice)
    Thanks

    State: RUNNING
    Started: Fri Mar 19 00:03:40 EDT 2010
    Version: 0.20.1, r810220
    Compiled: Tue Sep 1 20:55:56 UTC 2009 by oom
    Identifier: 201003190003
  • Saptarshi Guha at May 26, 2010 at 7:43 pm
    Hmm,
    Oh well, a bit of effort pays off:
    On 0.20.2

    org.apache.hadoop.mapreduce.JobID jid =
    org.apache.hadoop.mapreduce.JobID.forName(jd);
    org.apache.hadoop.mapred.JobClient jclient = new
    org.apache.hadoop.mapred.JobClient(
    org.apache.hadoop.mapred.JobTracker.getAddress(new Configuration()),new
    Configuration());
    org.apache.hadoop.mapred.JobID jj =
    org.apache.hadoop.mapred.JobID.downgrade(jid);
    org.apache.hadoop.mapred.RunningJob rj = jclient.getJob(jj);
    System.out.println(rj);

    On Wed, May 26, 2010 at 2:48 PM, Saptarshi Guha wrote:

    I found give, counters as obtained from RunningJob, I can still use it with
    the new style counters.
    But I'm still wondering how to get a RunningJob from a JobId.
    Is my Hadoop old?

    Regards
    Saptarshi

    On Wed, May 26, 2010 at 2:27 PM, Saptarshi Guha wrote:

    Hello,
    I'm using Hadoop 0.20.1, I submitted a job using the
    org.apache.hadoop.mapreduce.Job approach e.g.

    org.apache.hadoop.mapreduce.Job _job
    job_.submit();

    However, I would like to,from another Java program, monitor this job:

    1) get map/reduce progress
    2) get counters
    3) status
    4) and the ability to join i.e wait for completion.

    say jd = "job_201003190003_1855"

    org.apache.hadoop.mapreduce.JobID jid =
    org.apache.hadoop.mapreduce.JobID.forName(jd);
    org.apache.hadoop.mapred.JobClient jclient = new
    org.apache.hadoop.mapred.JobClient();

    Now, what?

    http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)

    says, I should use
    1. public RunningJob getJob(JobID jobid)

    but, in my installation of hadoop 0.20.1, i do not have such a method.

    2. Even if I could, runningjob.getCounters() returns the old deprecated
    Counters. Is this compatible with the new mapreduce code?

    Is there any org.apache.mapreduce code that given the JobId, I can
    monitor and wait for completion for the job?

    I can get a org.apache.hadoop.mapreduce.JobContext with the Job ID, but
    there should be a way to get the Job.

    One option, is to serialize the job_ variable and load it in later(not my
    first choice)
    Thanks

    State: RUNNING
    Started: Fri Mar 19 00:03:40 EDT 2010
    Version: 0.20.1, r810220
    Compiled: Tue Sep 1 20:55:56 UTC 2009 by oom
    Identifier: 201003190003
  • Arv Mistry at May 27, 2010 at 1:39 am
    Hi,

    Can anyone direct me to any documentation/examples on using data encryption for map/reduce jobs. And can you do both compress and encrypt the output? Thanks for any informatioin advance!

    Cheers Arv
  • Ted Yu at May 27, 2010 at 2:39 am
    Owen should be able to provide more details:
    http://markmail.org/thread/d2cmsacn32vdatpl
    On Wed, May 26, 2010 at 6:34 PM, Arv Mistry wrote:

    Hi,

    Can anyone direct me to any documentation/examples on using data encryption
    for map/reduce jobs. And can you do both compress and encrypt the output?
    Thanks for any informatioin advance!

    Cheers Arv
  • Amandeep Khurana at May 27, 2010 at 2:47 am
    At UCSC we are working on encryption is petascale systems and have a design
    and a prototype implementation on Hadoop.

    I'm interested in seeing Owen's idea too...


    Amandeep Khurana
    Computer Science Graduate Student
    University of California, Santa Cruz

    On Wed, May 26, 2010 at 7:39 PM, Ted Yu wrote:

    Owen should be able to provide more details:
    http://markmail.org/thread/d2cmsacn32vdatpl
    On Wed, May 26, 2010 at 6:34 PM, Arv Mistry wrote:

    Hi,

    Can anyone direct me to any documentation/examples on using data
    encryption
    for map/reduce jobs. And can you do both compress and encrypt the output?
    Thanks for any informatioin advance!

    Cheers Arv
  • Allen Wittenauer at May 27, 2010 at 3:17 am
    I'm pretty sure Owen was thinking that you could use a CompressionCodec to do data encryption.
    On May 26, 2010, at 7:46 PM, Amandeep Khurana wrote:

    At UCSC we are working on encryption is petascale systems and have a design
    and a prototype implementation on Hadoop.

    I'm interested in seeing Owen's idea too...
    I'm pretty sure Owen was thinking that you could use a CompressionCodec to do data encryption. The only tricky part would be to get the keys there and with 0.22's security framework that should be in place with the key store.
  • Arv Mistry at May 27, 2010 at 2:03 pm
    Thanks for responding Ted. I did see that link before but there wasn't enough details there for me to make sense of it. I'm not sure who Owen is ;(

    Cheers Arv

    ________________________________

    From: Ted Yu
    Sent: Wed 26/05/2010 10:39 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Encryption in Hadoop 0.20.1?



    Owen should be able to provide more details:
    http://markmail.org/thread/d2cmsacn32vdatpl
    On Wed, May 26, 2010 at 6:34 PM, Arv Mistry wrote:

    Hi,

    Can anyone direct me to any documentation/examples on using data encryption
    for map/reduce jobs. And can you do both compress and encrypt the output?
    Thanks for any informatioin advance!

    Cheers Arv
  • Owen O'Malley at May 27, 2010 at 3:55 pm

    On Thu, May 27, 2010 at 6:58 AM, Arv Mistry wrote:
    Thanks for responding Ted. I did see that link before but there wasn't enough details there for me to make sense of it. I'm not sure who Owen is ;(
    I'm Owen, although I think I've used at least 5 different email
    addresses on these lists at various times. *smile*

    Since you specify 0.20, you'd probably want to put your keys in to
    HDFS and read it from the tasks. Note that this is *not* secure and
    other users of your cluster can access your data in HDFS with only a
    tiny bit of mis-direction. (This will be fixed in 0.22, where we are
    adding strong authentication based on Kerberos.)

    The next step would be to define a compression codec that does the
    encryption. So let's say you define a XorEncryption that does a simple
    xor with a byte. (Obviously, you would use something better than xor,
    it is just an example!) XorEncryption would need to implement
    org.apache.hadoop.io.compression.CompressionCodec. You'd also need add
    your new class to the list of codecs in the configuration variable
    io.compression.codecs.

    For details of how to configure your mapreduce job with compression
    (or in this case encryption), look at http://bit.ly/9PMHUA. If
    XorEncryption returned ".xor" getDefaultExtension, then any file that
    ended in .xor would automatically be put through the encryption. So
    input is automatically handled. You need to define some configuration
    variables to get it applied to the output of MapReduce.

    -- Owen
  • Greg Roelofs at May 28, 2010 at 1:20 am

    Owen wrote:

    For details of how to configure your mapreduce job with compression
    (or in this case encryption), look at http://bit.ly/9PMHUA.
    Since Arv asked about doing both: in case it's not obvious, compress
    _first_, then encrypt. (In fact, this is exactly what PGP, GnuPG, etc.,
    do.)

    Greg
  • Arv Mistry at Jun 1, 2010 at 2:29 pm
    Hi,

    I have a java process that writes compressed data to the HDFS. The way I
    am doing that is wrapping the FSDataOutputSTream with GZIPOutputStream
    and calling the write() method i.e. something like

    FSDataOutputSTream out = fs.create(file);
    gzip = new GZIPOutputStream(out);
    gzip.write("sss".getBytes("UTF8");

    The file seems to get written ok.

    However, when I get the file out of HDFS and try to unzip it, it
    complains;

    gunzip: cs_1_20100601_120000_1275396891183.cgz: unknown suffix --
    ignored

    When I do 'file' it is recognized as 'gzip compressed data, from FAT
    filesystem (MS-DOS, OS/2, NT)'

    Any ideas? Appreciate any help.

    Cheers Arv
  • Eric Sammer at Jun 1, 2010 at 3:25 pm
    This isn't really a Hadoop issue, but gunzip will refuse to decompress
    files that don't have a well known suffix. Rename the file to have the
    file .gz and try again or use the -S option to specify an alternate
    suffix.
    On Tue, Jun 1, 2010 at 10:28 AM, Arv Mistry wrote:
    Hi,

    I have a java process that writes compressed data to the HDFS. The way I
    am doing that is wrapping the FSDataOutputSTream with GZIPOutputStream
    and calling the write() method i.e. something like

    FSDataOutputSTream  out = fs.create(file);
    gzip = new GZIPOutputStream(out);
    gzip.write("sss".getBytes("UTF8");

    The file seems to get written ok.

    However, when I get the file out of HDFS and try to unzip it, it
    complains;

    gunzip: cs_1_20100601_120000_1275396891183.cgz: unknown suffix --
    ignored

    When I do 'file' it is recognized as 'gzip compressed data, from FAT
    filesystem (MS-DOS, OS/2, NT)'

    Any ideas? Appreciate any help.

    Cheers Arv


    --
    Eric Sammer
    phone: +1-917-287-2675
    twitter: esammer
    data: www.cloudera.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 26, '10 at 6:28p
activeJun 1, '10 at 3:25p
posts12
users8
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase