Hi,
I've been playing with 0.23.0, really nice stuff! I was able to setup a
small test cluster (40 nodes) and launch the example jobs. I was also
able to recompile old Hadoop programs with the new jars and start up
those programs as well. My question is the following:
We have an HDFS instance based on 0.20 that I would like to hook up to
YARN. This appears to be a bit of work. Launching the jobs gives me
the following error:
2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc
proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,133 INFO mapreduce.Cluster
(Cluster.java:initialize(116)) - Failed to use
org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
Exception in thread "main" java.io.IOException: Cannot initialize
Cluster. Please check your configuration for mapreduce.framework.name
and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:78)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
at
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
at
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
After doing a little digging it appears that YarnClientProtocolProvider
creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that
is not available available in older versions of HDFS.
What versions of HDFS are currently supported and what HDFS versions are
planned for support? It would be great to be able to run YARN on legacy
HDFS installations.
Thanks,
Avery