FAQ
Hi Folks

Currently we use distCp to transfer files between two hadoop clusters. I
have a perl script which calls a system command “hadoop distcp....” to
achieve this.

Is there a Java Api to do distCp, so that we can avoid system calls from our
java code?

Thanks
Balu

Search Discussions

  • Tsz Wo \(Nicholas\), Sze at Feb 18, 2010 at 6:59 am
    Hi Balu,

    Unfortunately, DistCp does not have a public Java API. One simple way is to invoke DistCp.main(args) in your java program, where args is an array of the string arguments you would pass in the command line.

    Hope this helps.
    Nicholas Sze



    ----- Original Message ----
    From: Balu Vellanki <balusmbox@gmail.com>
    To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
    Sent: Wed, February 17, 2010 5:43:11 PM
    Subject: JavaDocs for DistCp (or similar)

    Hi Folks

    Currently we use distCp to transfer files between two hadoop clusters. I
    have a perl script which calls a system command “hadoop distcp....” to
    achieve this.

    Is there a Java Api to do distCp, so that we can avoid system calls from our
    java code?

    Thanks
    Balu
  • Tsz Wo \(Nicholas\), Sze at Feb 18, 2010 at 7:51 am
    Oops, DistCp.main(..) calls System.exit(..) at the end. So it would also terminate your Java program. It probably is not desirable. You may still use similar codes as the ones in DistCp.main(..) as shown below. However, they are not stable APIs.


    //DistCp.main
    public static void main(String[] args) throws Exception {
    JobConf job = new JobConf(DistCp.class);
    DistCp distcp = new DistCp(job);
    int res = ToolRunner.run(distcp, args);
    System.exit(res);
    }

    Nicholas


    ----- Original Message ----
    From: "Tsz Wo (Nicholas), Sze" <s29752-hadoopuser@yahoo.com>
    To: common-user@hadoop.apache.org
    Sent: Wed, February 17, 2010 10:58:58 PM
    Subject: Re: JavaDocs for DistCp (or similar)

    Hi Balu,

    Unfortunately, DistCp does not have a public Java API. One simple way is to
    invoke DistCp.main(args) in your java program, where args is an array of the
    string arguments you would pass in the command line.

    Hope this helps.
    Nicholas Sze



    ----- Original Message ----
    From: Balu Vellanki
    To: "common-user@hadoop.apache.org"
    Sent: Wed, February 17, 2010 5:43:11 PM
    Subject: JavaDocs for DistCp (or similar)

    Hi Folks

    Currently we use distCp to transfer files between two hadoop clusters. I
    have a perl script which calls a system command “hadoop distcp....” to
    achieve this.

    Is there a Java Api to do distCp, so that we can avoid system calls from our
    java code?

    Thanks
    Balu
  • Steve Loughran at Feb 18, 2010 at 2:43 pm

    Tsz Wo (Nicholas), Sze wrote:
    Oops, DistCp.main(..) calls System.exit(..) at the end. So it would also terminate your Java program. It probably is not desirable. You may still use similar codes as the ones in DistCp.main(..) as shown below. However, they are not stable APIs.


    //DistCp.main
    public static void main(String[] args) throws Exception {
    JobConf job = new JobConf(DistCp.class);
    DistCp distcp = new DistCp(job);
    int res = ToolRunner.run(distcp, args);
    System.exit(res);
    }
    sorry, just replied saying roughly the same thing. Adding a formal API
    would be useful, as DistCP's implementation of Tool.run does assume that
    system.err is the right place to log,
  • Vellanki Bala Nageshwara Rao at Feb 18, 2010 at 6:40 pm
    Thanks folks, this helps

    Balu

    On 2/18/10 6:42 AM, "Steve Loughran" wrote:

    Tsz Wo (Nicholas), Sze wrote:
    Oops, DistCp.main(..) calls System.exit(..) at the end. So it would also
    terminate your Java program. It probably is not desirable. You may still
    use similar codes as the ones in DistCp.main(..) as shown below. However,
    they are not stable APIs.


    //DistCp.main
    public static void main(String[] args) throws Exception {
    JobConf job = new JobConf(DistCp.class);
    DistCp distcp = new DistCp(job);
    int res = ToolRunner.run(distcp, args);
    System.exit(res);
    }
    sorry, just replied saying roughly the same thing. Adding a formal API
    would be useful, as DistCP's implementation of Tool.run does assume that
    system.err is the right place to log,
  • Steve Loughran at Feb 18, 2010 at 2:41 pm

    Tsz Wo (Nicholas), Sze wrote:
    Hi Balu,

    Unfortunately, DistCp does not have a public Java API. One simple way is to invoke DistCp.main(args) in your java program, where args is an array of the string arguments you would pass in the command line.
    That's a method with System.exit() in, so either you run under a
    security manager or your app fails without warning

    Better to create your own Configuration instance, then a new DistCp object

    DistCp distcp = new DistCp(conf);
    int res = ToolRunner.run(distcp, args);

    It's still going to log at System.out/System.err instead of a log API,
    but your JVM should stay around without you having to jump through
    security manager hoops

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 18, '10 at 6:42a
activeFeb 18, '10 at 6:40p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase