FAQ
When I want to launch a hadoop job I use SCP to execute a command on the
Name node machine. I an wondering if there is
a way to launch a Hadoop job from a machine that is not on the cluster. How
to do this on a Windows box or a Mac would be
of special interest.

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Search Discussions

  • Harsh J at May 29, 2011 at 2:51 am
    Keep a local Hadoop installation with a mirror-copy config, and use
    "hadoop jar <jar>" to submit as usual (since the config points to the
    right areas, the jobs go there).

    For Windows you'd need Cygwin installed, however.
    On Sun, May 29, 2011 at 12:56 AM, Steve Lewis wrote:
    When I want to launch a hadoop job I use SCP to execute a command on the
    Name node machine. I an wondering if there is
    a way to launch a Hadoop job from a machine that is not on the cluster. How
    to do this on a Windows box or a Mac would be
    of special interest.

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com


    --
    Harsh J
  • Ferdy Galema at May 29, 2011 at 1:20 pm
    Would it not also be possible for a Windows machine to submit the job
    directly from a Java process? This way you don't need Cygwin / a full
    local copy of the installation (correct my if I'm wrong). The steps
    would then just be:
    1) Create a basic Java project, add minimum required libraries
    (Hadoop/logging)
    2) Set the essential properties (at least this would be the jobtracker
    and the filesystem)
    3) Implement the Tool
    4) Run the process (from either the IDE or stand-alone jar)

    Steps 1-3 could technically be implemented on another machine, if you
    choose to compile a stand-alone jar.

    Ferdy.
    On 05/29/2011 04:50 AM, Harsh J wrote:
    Keep a local Hadoop installation with a mirror-copy config, and use
    "hadoop jar<jar>" to submit as usual (since the config points to the
    right areas, the jobs go there).

    For Windows you'd need Cygwin installed, however.

    On Sun, May 29, 2011 at 12:56 AM, Steve Lewiswrote:
    When I want to launch a hadoop job I use SCP to execute a command on the
    Name node machine. I an wondering if there is
    a way to launch a Hadoop job from a machine that is not on the cluster. How
    to do this on a Windows box or a Mac would be
    of special interest.

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com
  • Steve Lewis at May 31, 2011 at 3:46 pm
    I have tried what you suggest (well sort of) a goof example would help alot
    -
    My reducer is set to among other things emit the local os and user.dir -
    when I try running from
    my windows box these appear on hdfs but show the windows os and user.dir
    leading me to believe that the reducer is still running on my windows
    machine - I will
    check the values but a working example would be very useful

    On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema wrote:

    Would it not also be possible for a Windows machine to submit the job
    directly from a Java process? This way you don't need Cygwin / a full local
    copy of the installation (correct my if I'm wrong). The steps would then
    just be:
    1) Create a basic Java project, add minimum required libraries
    (Hadoop/logging)
    2) Set the essential properties (at least this would be the jobtracker and
    the filesystem)
    3) Implement the Tool
    4) Run the process (from either the IDE or stand-alone jar)

    Steps 1-3 could technically be implemented on another machine, if you
    choose to compile a stand-alone jar.

    Ferdy.

    On 05/29/2011 04:50 AM, Harsh J wrote:

    Keep a local Hadoop installation with a mirror-copy config, and use
    "hadoop jar<jar>" to submit as usual (since the config points to the
    right areas, the jobs go there).

    For Windows you'd need Cygwin installed, however.

    On Sun, May 29, 2011 at 12:56 AM, Steve Lewis<lordjoe2000@gmail.com>
    wrote:
    When I want to launch a hadoop job I use SCP to execute a command on the
    Name node machine. I an wondering if there is
    a way to launch a Hadoop job from a machine that is not on the cluster.
    How
    to do this on a Windows box or a Mac would be
    of special interest.

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com
  • Harsh J at May 31, 2011 at 4:36 pm
    Steve,

    What do you mean when you say it shows "windows os" and "user.dir"?
    There will be a few properties in the job.xml that may carry client
    machine information but these shouldn't be a hinderance.

    Unless a TaskTracker was started on the Windows box (no daemons ought
    to be started on the client machine), no task may run on it.
    On Tue, May 31, 2011 at 9:15 PM, Steve Lewis wrote:
    I have tried what you suggest (well sort of) a goof example would help alot
    -
    My reducer is set to among other things emit the local os and user.dir -
    when I try running from
    my windows box these appear on hdfs but show the windows os and user.dir
    leading me to believe that the reducer is still running on my windows
    machine - I will
    check the values but a working example would be very useful
    On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema wrote:

    Would it not also be possible for a Windows machine to submit the job
    directly from a Java process? This way you don't need Cygwin / a full local
    copy of the installation (correct my if I'm wrong). The steps would then
    just be:
    1) Create a basic Java project, add minimum required libraries
    (Hadoop/logging)
    2) Set the essential properties (at least this would be the jobtracker and
    the filesystem)
    3) Implement the Tool
    4) Run the process (from either the IDE or stand-alone jar)

    Steps 1-3 could technically be implemented on another machine, if you
    choose to compile a stand-alone jar.

    Ferdy.
    On 05/29/2011 04:50 AM, Harsh J wrote:

    Keep a local Hadoop installation with a mirror-copy config, and use
    "hadoop jar<jar>" to submit as usual (since the config points to the
    right areas, the jobs go there).

    For Windows you'd need Cygwin installed, however.

    On Sun, May 29, 2011 at 12:56 AM, Steve Lewis<lordjoe2000@gmail.com>
    wrote:
    When I want to launch a hadoop job I use SCP to execute a command on the
    Name node machine. I an wondering if there is
    a way to launch a Hadoop job from a machine that is not on the cluster.
    How
    to do this on a Windows box or a Mac would be
    of special interest.

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com


    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com



    --
    Harsh J
  • Steve Lewis at May 31, 2011 at 4:57 pm
    My Reducer code says this:
    public static class Reduce extends Reducer<Text, Text, Text, Text> {
    private boolean m_DateSent;

    /**
    * This method is called once for each key. Most applications will
    define
    * their reduce class by overriding this method. The default
    implementation
    * is an identity function.
    */
    @Override
    protected void reduce(Text key, Iterable<Text> values,
    Context context)
    throws IOException, InterruptedException {
    if (!m_DateSent) {
    Text dkey = new Text("CreationDate");
    Text dValue = new Text();
    writeKeyValue(context, dkey,dValue,"CreationDate",new
    Date().toString());
    writeKeyValue(context,
    dkey,dValue,"user.dir",System.getProperty("user.dir"));
    writeKeyValue(context,
    dkey,dValue,"os.arch",System.getProperty("os.arch"));
    writeKeyValue(context, dkey,dValue,"os.name
    ",System.getProperty("os.name"));



    // dkey.set("ip");
    // java.net.InetAddress addr =
    java.net.InetAddress.getLocalHost();
    // dValue.set(System.getProperty(addr.toString()));
    // context.write(dkey, dValue);

    m_DateSent = true;
    }
    Iterator<Text> itr = values.iterator();
    // Add interesting code here
    while (itr.hasNext()) {
    Text vCheck = itr.next();
    context.write(key, vCheck);
    }

    }


    }

    if os.arch is linux I am running on the cluster -
    if windows I am running locally

    I run this main hoping to run on the cluster with the NameNode and Job
    Tracker at glados

    public static void main(String[] args) throws Exception {
    String outFile = "./out";
    Configuration conf = new Configuration();

    // cause output to go to the cluster
    conf.set("fs.default.name", "hdfs://glados:9000/");
    conf.set("mapreduce.jobtracker.address", "glados:9000/");
    conf.set("mapred.jar", "NShot.jar");

    conf.set("fs.defaultFS","hdfs://glados:9000/");


    Job job = new Job(conf, "Generated data");
    conf = job.getConfiguration();
    job.setJarByClass(NShotInputFormat.class);



    .. Other setup code ...

    boolean ans = job.waitForCompletion(true);
    int ret = ans ? 0 : 1;
    }


    On Tue, May 31, 2011 at 9:35 AM, Harsh J wrote:

    Steve,

    What do you mean when you say it shows "windows os" and "user.dir"?
    There will be a few properties in the job.xml that may carry client
    machine information but these shouldn't be a hinderance.

    Unless a TaskTracker was started on the Windows box (no daemons ought
    to be started on the client machine), no task may run on it.
    On Tue, May 31, 2011 at 9:15 PM, Steve Lewis wrote:
    I have tried what you suggest (well sort of) a goof example would help alot
    -
    My reducer is set to among other things emit the local os and user.dir -
    when I try running from
    my windows box these appear on hdfs but show the windows os and user.dir
    leading me to believe that the reducer is still running on my windows
    machine - I will
    check the values but a working example would be very useful

    On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema <ferdy.galema@kalooga.com>
    wrote:
    Would it not also be possible for a Windows machine to submit the job
    directly from a Java process? This way you don't need Cygwin / a full
    local
    copy of the installation (correct my if I'm wrong). The steps would then
    just be:
    1) Create a basic Java project, add minimum required libraries
    (Hadoop/logging)
    2) Set the essential properties (at least this would be the jobtracker
    and
    the filesystem)
    3) Implement the Tool
    4) Run the process (from either the IDE or stand-alone jar)

    Steps 1-3 could technically be implemented on another machine, if you
    choose to compile a stand-alone jar.

    Ferdy.
    On 05/29/2011 04:50 AM, Harsh J wrote:

    Keep a local Hadoop installation with a mirror-copy config, and use
    "hadoop jar<jar>" to submit as usual (since the config points to the
    right areas, the jobs go there).

    For Windows you'd need Cygwin installed, however.

    On Sun, May 29, 2011 at 12:56 AM, Steve Lewis<lordjoe2000@gmail.com>
    wrote:
    When I want to launch a hadoop job I use SCP to execute a command on
    the
    Name node machine. I an wondering if there is
    a way to launch a Hadoop job from a machine that is not on the
    cluster.
    How
    to do this on a Windows box or a Mac would be
    of special interest.

    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com


    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com



    --
    Harsh J


    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com
  • Harsh J at May 31, 2011 at 5:24 pm
    Simply remove that trailing slash (forgot to catch it earlier, sorry)
    and you should be set (or at least more set than before surely.)
    On Tue, May 31, 2011 at 10:51 PM, Steve Lewis wrote:
    0.20.2 - we have been avoiding 0.21 because it is not terribly stable and
    made some MAJOR changes to
    critical classes

    When I say

    Configuration conf = new Configuration();
    // cause output to go to the cluster
    conf.set("fs.default.name", "hdfs://glados:9000/");
    //     conf.set("mapreduce.jobtracker.address", "glados:9000/");
    conf.set("mapred.job.tracker", "glados:9000/");
    conf.set("mapred.jar", "NShot.jar");
    //  conf.set("fs.defaultFS","hdfs://glados:9000/");
    String[] otherArgs = new GenericOptionsParser(conf,
    args).getRemainingArgs();
    //        if (otherArgs.length != 2) {
    //            System.err.println("Usage: wordcount <in> <out>");
    //            System.exit(2);
    //        }
    Job job = new Job(conf, "Generated data");
    I get
    Exception in thread "main" java.lang.IllegalArgumentException:
    java.net.URISyntaxException: Relative path in absolute URI: glados:9000
    at org.apache.hadoop.fs.Path.initialize(Path.java:140)
    at org.apache.hadoop.fs.Path.<init>(Path.java:126)
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:150)
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123)
    at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
    at org.apache.hadoop.mapreduce.Job.<init>(Job.java:54)
    at org.systemsbiology.hadoopgenerated.NShotTest.main(NShotTest.java:188)
    Caused by: java.net.URISyntaxException: Relative path in absolute URI:
    glados:9000
    at java.net.URI.checkPath(URI.java:1787)
    at java.net.URI.<init>(URI.java:735)
    at org.apache.hadoop.fs.Path.initialize(Path.java:137)

    I promise to publish a working example if this ever works
    On Tue, May 31, 2011 at 10:02 AM, Harsh J wrote:

    Steve,

    On Tue, May 31, 2011 at 10:27 PM, Steve Lewis <lordjoe2000@gmail.com>
    wrote:
    My Reducer code says this:
    dkey,dValue,"os.arch",System.getProperty("os.arch"));
    writeKeyValue(context,
    dkey,dValue,"os.name",System.getProperty("os.name"));
    if os.arch is linux I am running on the cluster -
    if windows I am running locally
    Correct, so it should be Linux since these are System properties, and
    if you're getting Windows its probably running locally on your client
    box itself!
    conf.set("mapreduce.jobtracker.address", "glados:9000/");
    This here might be your problem. That form of property would only work
    with 0.21.x, while on 0.20.x if you do not set it as
    "mapred.job.tracker" then the local job runner takes over by default,
    thereby making this odd thing happen (that's my guess).

    What version of Hadoop are you using?

    --
    Harsh J


    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com



    --
    Harsh J
  • Steve Loughran at Jun 6, 2011 at 9:36 am

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 28, '11 at 7:26p
activeJun 6, '11 at 9:36a
posts8
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase