Hi,

I've been playing with 0.23.0, really nice stuff! I was able to setup a
small test cluster (40 nodes) and launch the example jobs. I was also
able to recompile old Hadoop programs with the new jars and start up
those programs as well. My question is the following:

We have an HDFS instance based on 0.20 that I would like to hook up to
YARN. This appears to be a bit of work. Launching the jobs gives me
the following error:

2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc
proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,133 INFO mapreduce.Cluster
(Cluster.java:initialize(116)) - Failed to use
org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
Exception in thread "main" java.io.IOException: Cannot initialize
Cluster. Please check your configuration for mapreduce.framework.name
and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:78)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
at
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
at
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

After doing a little digging it appears that YarnClientProtocolProvider
creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that
is not available available in older versions of HDFS.

What versions of HDFS are currently supported and what HDFS versions are
planned for support? It would be great to be able to run YARN on legacy
HDFS installations.

Thanks,

Avery

Search Discussions

  • Mahadev Konar at Dec 6, 2011 at 5:14 am
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev
    On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching wrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff!  I was able to setup a
    small test cluster (40 nodes) and launch the example jobs.  I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well.  My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
     This appears to be a bit of work.  Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
       at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
       at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
       at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
       at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
       at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:396)
       at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
       at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
       at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
       at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
       at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
       at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support?  It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Avery Ching at Dec 6, 2011 at 6:59 am
    Thank you for the response, that's what I thought as well =). I spent
    the day trying to port the required 0.23 APIs to 0.20 HDFS. There have
    been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Arun C Murthy at Dec 6, 2011 at 4:52 pm
    Avery,

    They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

    We have used the new HDFS apis in YARN in some places.

    hth,
    Arun
    On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

    Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Avery Ching at Dec 6, 2011 at 6:05 pm
    I think it would be nice if YARN could work on existing older HDFS
    instances, a lot of folks will be slow to upgrade HDFS with all their
    important data on it. I could also go that route I guess.

    Avery
    On 12/6/11 8:51 AM, Arun C Murthy wrote:
    Avery,

    They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

    We have used the new HDFS apis in YARN in some places.

    hth,
    Arun
    On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

    Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Arun C Murthy at Dec 6, 2011 at 11:51 pm
    Avery,

    If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.

    thanks,
    Arun
    On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:

    I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.

    Avery
    On 12/6/11 8:51 AM, Arun C Murthy wrote:
    Avery,

    They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

    We have used the new HDFS apis in YARN in some places.

    hth,
    Arun
    On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

    Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Avery Ching at Dec 8, 2011 at 9:30 pm
    I was able to convert FileContext to FileSystem and related methods
    fairly straightforwardly, but am running into issues of dealing with
    security incompatibilites (i.e. UserGroupInformation, etc.). Yuck.

    Avery
    On 12/6/11 3:50 PM, Arun C Murthy wrote:
    Avery,

    If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.

    thanks,
    Arun
    On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:

    I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.

    Avery
    On 12/6/11 8:51 AM, Arun C Murthy wrote:
    Avery,

    They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

    We have used the new HDFS apis in YARN in some places.

    hth,
    Arun
    On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

    Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Arun C Murthy at Dec 9, 2011 at 11:15 pm
    I assume you have security switched off.

    What issues are you running into?
    On Dec 8, 2011, at 1:30 PM, Avery Ching wrote:

    I was able to convert FileContext to FileSystem and related methods fairly straightforwardly, but am running into issues of dealing with security incompatibilites (i.e. UserGroupInformation, etc.). Yuck.

    Avery
    On 12/6/11 3:50 PM, Arun C Murthy wrote:
    Avery,

    If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.

    thanks,
    Arun
    On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:

    I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.

    Avery
    On 12/6/11 8:51 AM, Arun C Murthy wrote:
    Avery,

    They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

    We have used the new HDFS apis in YARN in some places.

    hth,
    Arun
    On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

    Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery
  • Avery Ching at Dec 10, 2011 at 5:22 pm
    Well, UserGroupInformation and its subclasses in 0.20 are very different
    than 0.23. For instance, in 0.23, UserGroupInformation has a private
    constructor, this causes issues as UnixUserGroupInformation extends
    UserGroupInformation. Also, the RPC class appears to be very different
    for 0.23 and 0.20. Yarn's ClientRMProtocolPBClientImpl uses methods
    like RPC.setProtocolEngine which aren't available in the 0.20 RPC.

    For what it's worth, I was able to get YARN to start a job with the 0.20
    RPC (after a lot of hacks), but then ran into

    Caused by: com.google.protobuf.ServiceException:
    org.apache.hadoop.ipc.RPC$VersionMismatch: Server IPC version 5 cannot
    communicate with client version 6
    at
    org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
    at $Proxy0.getClusterMetrics(Unknown Source)
    at
    org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:128)
    ... 24 more
    Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Server IPC version
    5 cannot communicate with client version 6
    at org.apache.hadoop.ipc.Client.call(Client.java:1125)
    at org.apache.hadoop.ipc.Client.call(Client.java:1095)

    Avery
    On 12/9/11 3:15 PM, Arun C Murthy wrote:
    I assume you have security switched off.

    What issues are you running into?
    On Dec 8, 2011, at 1:30 PM, Avery Ching wrote:

    I was able to convert FileContext to FileSystem and related methods fairly straightforwardly, but am running into issues of dealing with security incompatibilites (i.e. UserGroupInformation, etc.). Yuck.

    Avery
    On 12/6/11 3:50 PM, Arun C Murthy wrote:
    Avery,

    If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.

    thanks,
    Arun
    On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:

    I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.

    Avery
    On 12/6/11 8:51 AM, Arun C Murthy wrote:
    Avery,

    They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

    We have used the new HDFS apis in YARN in some places.

    hth,
    Arun
    On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

    Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!

    Avery
    On 12/5/11 9:14 PM, Mahadev Konar wrote:
    Avery,
    Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
    wrong but looking at the HDFS apis' it doesnt look like that it would
    be a lot of work to getting it to work with 0.20 apis. We had been
    using filecontext api's initially but have transitioned back to the
    old API's.

    Hope that helps.

    mahadev

    On Mon, Dec 5, 2011 at 4:01 PM, Avery Chingwrote:
    Hi,

    I've been playing with 0.23.0, really nice stuff! I was able to setup a
    small test cluster (40 nodes) and launch the example jobs. I was also able
    to recompile old Hadoop programs with the new jars and start up those
    programs as well. My question is the following:

    We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
    This appears to be a bit of work. Launching the jobs gives me the
    following error:

    2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
    Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
    2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
    (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
    for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
    2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
    (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
    {removed}.{xxx}/{removed}:50177
    2011-12-05 15:48:05,133 INFO mapreduce.Cluster
    (Cluster.java:initialize(116)) - Failed to use
    org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
    java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
    Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
    Please check your configuration for mapreduce.framework.name and the
    correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at
    org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

    After doing a little digging it appears that YarnClientProtocolProvider
    creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
    not available available in older versions of HDFS.

    What versions of HDFS are currently supported and what HDFS versions are
    planned for support? It would be great to be able to run YARN on legacy
    HDFS installations.

    Thanks,

    Avery

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedDec 6, '11 at 12:01a
activeDec 10, '11 at 5:22p
posts9
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase