Hi all,
I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
combine mapper as Java class and reducer as C++ program. Currently I'm
at the beginning of this task and now I have troubles with Java class.
It looks something like
package org.company;
...
public class TestMapper extends MapReduceBase implements Mapper {
...
public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {
...
I created jar file with my class and it is accessible via $CLASSPATH.
I'm running stream job using
$HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input /data
-output /out1
Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The error is
===========================
2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 1
2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: org.company.TestMapper
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
at
org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.company.TestMapper
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
... 7 more
===========================
What is interesting for me. I had put into Hadoop streaming
(StreamJob.java and StreamUtil.java) some debugging println(). Streaming
can see TestMapper on job configuration stage (StreamJob.setJobConf()
routine) but cannot later. Next code creates new instance of TestMapper
and calls toString() defined in TestMapper. It works.
if (mapCmd_ != null) {
c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
if (c != null) {
System.out.println("#######################");
try {
System.out.println(c.newInstance().toString());
} catch (Exception e) { }
System.out.println("#######################");
jobConf_.setMapperClass(c);
} else {
...
}
}
I tried to add jar file with TestMapper using option
"-file test_mapper.jar" . The result is the same.
Could anybody advice me something? Thanks in advance,
---
Andrey Pankov.