FAQ
Hi all,

I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
combine mapper as Java class and reducer as C++ program. Currently I'm
at the beginning of this task and now I have troubles with Java class.
It looks something like


package org.company;
...
public class TestMapper extends MapReduceBase implements Mapper {
...
public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {
...


I created jar file with my class and it is accessible via $CLASSPATH.
I'm running stream job using

$HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input /data
-output /out1

Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The error is

===========================
2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 1
2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: org.company.TestMapper
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
at
org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.company.TestMapper
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
... 7 more
===========================

What is interesting for me. I had put into Hadoop streaming
(StreamJob.java and StreamUtil.java) some debugging println(). Streaming
can see TestMapper on job configuration stage (StreamJob.setJobConf()
routine) but cannot later. Next code creates new instance of TestMapper
and calls toString() defined in TestMapper. It works.

if (mapCmd_ != null) {
c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
if (c != null) {
System.out.println("#######################");
try {
System.out.println(c.newInstance().toString());
} catch (Exception e) { }
System.out.println("#######################");
jobConf_.setMapperClass(c);
} else {
...
}
}


I tried to add jar file with TestMapper using option
"-file test_mapper.jar" . The result is the same.

Could anybody advice me something? Thanks in advance,

---
Andrey Pankov.

Search Discussions

  • Amareshwari Sriramadasu at Mar 12, 2008 at 4:14 am
    Hi Andrey,

    I think that is classpath problem.
    Can you try using patch at
    https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have
    the problem?

    Thanks
    Amareshwari.

    Andrey Pankov wrote:
    Hi all,

    I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
    combine mapper as Java class and reducer as C++ program. Currently I'm
    at the beginning of this task and now I have troubles with Java class.
    It looks something like


    package org.company;
    ...
    public class TestMapper extends MapReduceBase implements Mapper {
    ...
    public void map(WritableComparable key, Writable value,
    OutputCollector output, Reporter reporter) throws IOException {
    ...


    I created jar file with my class and it is accessible via $CLASSPATH.
    I'm running stream job using

    $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input
    /data -output /out1

    Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The
    error is

    ===========================
    2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=MAP, sessionId=
    2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask:
    numReduceTasks: 1
    2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker:
    Error running child
    java.lang.RuntimeException: java.lang.RuntimeException:
    java.lang.ClassNotFoundException: org.company.TestMapper
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
    at
    org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
    at
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
    Caused by: java.lang.RuntimeException:
    java.lang.ClassNotFoundException: org.company.TestMapper
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
    ... 6 more
    Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)

    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
    ... 7 more
    ===========================

    What is interesting for me. I had put into Hadoop streaming
    (StreamJob.java and StreamUtil.java) some debugging println().
    Streaming can see TestMapper on job configuration stage
    (StreamJob.setJobConf() routine) but cannot later. Next code creates
    new instance of TestMapper and calls toString() defined in TestMapper.
    It works.

    if (mapCmd_ != null) {
    c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
    if (c != null) {
    System.out.println("#######################");
    try {
    System.out.println(c.newInstance().toString());
    } catch (Exception e) { }
    System.out.println("#######################");
    jobConf_.setMapperClass(c);
    } else {
    ...
    }
    }


    I tried to add jar file with TestMapper using option
    "-file test_mapper.jar" . The result is the same.

    Could anybody advice me something? Thanks in advance,

    ---
    Andrey Pankov.
  • Andrey Pankov at Mar 12, 2008 at 2:12 pm
    Hi Amareshwari,

    I have applied that patch and run my job successfully. I had to specify
    jar file with '-file' option, even if it is available via $CLASSPATH:

    $HSTREAMING -mapper org.company.TestMapper -reducer "cat" -input /data
    -output /out4 -file /path/to/test_mapper.jar

    Thanks a lot!


    Amareshwari Sriramadasu wrote:
    Hi Andrey,

    I think that is classpath problem.
    Can you try using patch at
    https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have
    the problem?

    Thanks
    Amareshwari.

    Andrey Pankov wrote:
    Hi all,

    I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
    combine mapper as Java class and reducer as C++ program. Currently I'm
    at the beginning of this task and now I have troubles with Java class.
    It looks something like


    package org.company;
    ...
    public class TestMapper extends MapReduceBase implements Mapper {
    ...
    public void map(WritableComparable key, Writable value,
    OutputCollector output, Reporter reporter) throws IOException {
    ...


    I created jar file with my class and it is accessible via $CLASSPATH.
    I'm running stream job using

    $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input
    /data -output /out1

    Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The
    error is

    ===========================
    2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=MAP, sessionId=
    2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask:
    numReduceTasks: 1
    2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker:
    Error running child
    java.lang.RuntimeException: java.lang.RuntimeException:
    java.lang.ClassNotFoundException: org.company.TestMapper
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
    at
    org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
    at
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
    Caused by: java.lang.RuntimeException:
    java.lang.ClassNotFoundException: org.company.TestMapper
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
    ... 6 more
    Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)

    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
    ... 7 more
    ===========================

    What is interesting for me. I had put into Hadoop streaming
    (StreamJob.java and StreamUtil.java) some debugging println().
    Streaming can see TestMapper on job configuration stage
    (StreamJob.setJobConf() routine) but cannot later. Next code creates
    new instance of TestMapper and calls toString() defined in TestMapper.
    It works.

    if (mapCmd_ != null) {
    c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
    if (c != null) {
    System.out.println("#######################");
    try {
    System.out.println(c.newInstance().toString());
    } catch (Exception e) { }
    System.out.println("#######################");
    jobConf_.setMapperClass(c);
    } else {
    ...
    }
    }


    I tried to add jar file with TestMapper using option
    "-file test_mapper.jar" . The result is the same.

    Could anybody advice me something? Thanks in advance,

    ---
    Andrey Pankov.
    ---
    Andrey Pankov.
  • Peeyush Bishnoi at Mar 12, 2008 at 5:54 am
    Hello Andrey

    Just look at the -cacheDir with streaming , if it can help you out

    http://hadoop.apache.org/core/docs/current/streaming.html#Large+files
    +and+archives+in+Hadoop+Streaming


    Thankyou ,

    ---
    Peeyush
    On Tue, 2008-03-11 at 17:30 +0200, Andrey Pankov wrote:

    Hi all,

    I'm still new to Hadoop. I'd like to use Hadoop streaming in order to
    combine mapper as Java class and reducer as C++ program. Currently I'm
    at the beginning of this task and now I have troubles with Java class.
    It looks something like


    package org.company;
    ...
    public class TestMapper extends MapReduceBase implements Mapper {
    ...
    public void map(WritableComparable key, Writable value,
    OutputCollector output, Reporter reporter) throws IOException {
    ...


    I created jar file with my class and it is accessible via $CLASSPATH.
    I'm running stream job using

    $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input /data
    -output /out1

    Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The error is

    ===========================
    2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=MAP, sessionId=
    2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask:
    numReduceTasks: 1
    2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: Error
    running child
    java.lang.RuntimeException: java.lang.RuntimeException:
    java.lang.ClassNotFoundException: org.company.TestMapper
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
    at
    org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
    org.company.TestMapper
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
    ... 6 more
    Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
    ... 7 more
    ===========================

    What is interesting for me. I had put into Hadoop streaming
    (StreamJob.java and StreamUtil.java) some debugging println(). Streaming
    can see TestMapper on job configuration stage (StreamJob.setJobConf()
    routine) but cannot later. Next code creates new instance of TestMapper
    and calls toString() defined in TestMapper. It works.

    if (mapCmd_ != null) {
    c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
    if (c != null) {
    System.out.println("#######################");
    try {
    System.out.println(c.newInstance().toString());
    } catch (Exception e) { }
    System.out.println("#######################");
    jobConf_.setMapperClass(c);
    } else {
    ...
    }
    }


    I tried to add jar file with TestMapper using option
    "-file test_mapper.jar" . The result is the same.

    Could anybody advice me something? Thanks in advance,

    ---
    Andrey Pankov.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 11, '08 at 3:32p
activeMar 12, '08 at 2:12p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase