FAQ
Hi,

I was able use streaming in hadoop using python for the wordcount program,
but created a Mapper and Reducer in Java since all my code is currently in
Java.
I first tried this:
echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper | sort
java -cp ~/dummy.jar WCReducer
It gave the correct output:
labs 1
foo 3
bar 1
quux 2

Then, I installed a single-node cluster in hadoop and tried this: hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
gutenberg/* -output gutenberg-output -file dummy.jar (by tailoring the
python command)

This is the error:
hadoop@siddhartha-laptop:/usr/local/hadoop$ hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
gutenberg/* -output gutenberg-output -file dummy.jar
packageJobJar: [dummy.jar, /app/hadoop/tmp/hadoop-unjar5573454211442575176/]
[] /tmp/streamjob6721719460213928092.jar tmpDir=null
11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs():
[/app/hadoop/tmp/mapred/local]
11/06/04 20:47:15 INFO streaming.StreamJob: Running job:
job_201106031901_0039
11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
11/06/04 20:47:15 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
-kill job_201106031901_0039
11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
11/06/04 20:48:00 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
-kill job_201106031901_0039
11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA
11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
Streaming Job Failed!

Any advice?

Sincerely,
Siddhartha Jonnalagadda,
Text mining Researcher, Lnx Research, LLC, Orange, CA
sjonnalagadda.wordpress.com


Confidentiality Notice:

This e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the sender
by reply e-mail and destroy all copies of the original message.

Search Discussions

  • Marcos Ortiz Valmaseda at Jun 5, 2011 at 5:59 pm
    Why are using Java in streming mode instead use the native Mapper/Reducer code?
    Can you show to us the JobTracker's logs?

    Regards
    ----- Mensaje original -----
    De: "Siddhartha Jonnalagadda" <sid.kgp@gmail.com>
    Para: mapreduce-user@hadoop.apache.org
    Enviados: Domingo, 5 de Junio 2011 7:16:08 GMT +01:00 Amsterdam / Berlín / Berna / Roma / Estocolmo / Viena
    Asunto: question about using java in streaming mode

    Hi,


    I was able use streaming in hadoop using python for the wordcount program, but created a Mapper and Reducer in Java since all my code is currently in Java.
    I first tried this:
    echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper | sort | java -cp ~/dummy.jar WCReducer

    It gave the correct output:
    labs 1
    foo 3
    bar 1
    quux 2

    Then, I installed a single-node cluster in hadoop and tried this: hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input gutenberg/* -output gutenberg-output -file dummy.jar (by tailoring the python command)

    This is the error:
    hadoop@siddhartha-laptop:/usr/local/hadoop$ hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input gutenberg/* -output gutenberg-output -file dummy.jar
    packageJobJar: [dummy.jar, /app/hadoop/tmp/hadoop-unjar5573454211442575176/] [] /tmp/streamjob6721719460213928092.jar tmpDir=null
    11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to process : 3
    11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
    11/06/04 20:47:15 INFO streaming.StreamJob: Running job: job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:47:15 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
    11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
    11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:48:00 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201106031901_0039
    11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA
    11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
    Streaming Job Failed!

    Any advice?
    Sincerely,
    Siddhartha Jonnalagadda,
    Text mining Researcher, Lnx Research, LLC, Orange, CA
    sjonnalagadda.wordpress.com







    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    http://marcosluis2186.posterous.com
  • Siddhartha Jonnalagadda at Jun 5, 2011 at 8:02 pm
    Hi Marcos,

    I thought that streaming would make it easier because I was getting
    different errors with extending mapper and reducer in java.

    I tried: hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file
    dummy.jar -mapper "java -cp dummy.jar WCMapper" -reducer "java -cp dummy.jar
    WCReducer" -input gutenberg/* -output gutenberg-output

    The error log in the map task:
    *stderr logs*

    Exception in thread "main" java.lang.NoClassDefFoundError: WCMapper
    Caused by: java.lang.ClassNotFoundException: WCMapper
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)

    Could not find the main class: WCMapper. Program will exit.
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)

    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:121)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
    failed with code 1

    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)

    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

    at org.apache.hadoop.mapred.Child.main(Child.java:253)

    ------------------------------



    Sincerely,
    Siddhartha Jonnalagadda,
    sjonnalagadda.wordpress.com


    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of the
    intended recipient(s) and may contain confidential and privileged
    information. Any unauthorized review, use, disclosure or distribution is
    prohibited. If you are not the intended recipient, please contact the sender
    by reply e-mail and destroy all copies of the original message.




    On Sun, Jun 5, 2011 at 10:59 AM, Marcos Ortiz Valmaseda wrote:

    Why are using Java in streming mode instead use the native Mapper/Reducer
    code?
    Can you show to us the JobTracker's logs?

    Regards
    ----- Mensaje original -----
    De: "Siddhartha Jonnalagadda" <sid.kgp@gmail.com>
    Para: mapreduce-user@hadoop.apache.org
    Enviados: Domingo, 5 de Junio 2011 7:16:08 GMT +01:00 Amsterdam / Berlín /
    Berna / Roma / Estocolmo / Viena
    Asunto: question about using java in streaming mode

    Hi,


    I was able use streaming in hadoop using python for the wordcount program,
    but created a Mapper and Reducer in Java since all my code is currently in
    Java.
    I first tried this:
    echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper | sort
    java -cp ~/dummy.jar WCReducer
    It gave the correct output:
    labs 1
    foo 3
    bar 1
    quux 2

    Then, I installed a single-node cluster in hadoop and tried this: hadoop
    jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
    ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
    gutenberg/* -output gutenberg-output -file dummy.jar (by tailoring the
    python command)

    This is the error:
    hadoop@siddhartha-laptop:/usr/local/hadoop$ hadoop jar
    contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
    ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
    gutenberg/* -output gutenberg-output -file dummy.jar
    packageJobJar: [dummy.jar,
    /app/hadoop/tmp/hadoop-unjar5573454211442575176/] []
    /tmp/streamjob6721719460213928092.jar tmpDir=null
    11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to process
    : 3
    11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs():
    [/app/hadoop/tmp/mapred/local]
    11/06/04 20:47:15 INFO streaming.StreamJob: Running job:
    job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:47:15 INFO streaming.StreamJob:
    /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
    -kill job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL:
    http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
    11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
    11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:48:00 INFO streaming.StreamJob:
    /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
    -kill job_201106031901_0039
    11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL:
    http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA
    11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
    Streaming Job Failed!

    Any advice?
    Sincerely,
    Siddhartha Jonnalagadda,
    Text mining Researcher, Lnx Research, LLC, Orange, CA
    sjonnalagadda.wordpress.com







    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of the
    intended recipient(s) and may contain confidential and privileged
    information. Any unauthorized review, use, disclosure or distribution is
    prohibited. If you are not the intended recipient, please contact the sender
    by reply e-mail and destroy all copies of the original message.

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    http://marcosluis2186.posterous.com
  • Siddhartha Jonnalagadda at Jun 5, 2011 at 8:19 pm
    This sounds stupid, but the mapper part now works fine, if i use -files
    dummy.jar instead of -file dummy.jar.

    The reducer is going into an infinite loop (0%, 22%, 0%, ...), but I will
    figure that out later!

    Sincerely,
    Siddhartha Jonnalagadda,
    sjonnalagadda.wordpress.com


    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of the
    intended recipient(s) and may contain confidential and privileged
    information. Any unauthorized review, use, disclosure or distribution is
    prohibited. If you are not the intended recipient, please contact the sender
    by reply e-mail and destroy all copies of the original message.





    On Sun, Jun 5, 2011 at 1:01 PM, Siddhartha Jonnalagadda
    wrote:
    Hi Marcos,

    I thought that streaming would make it easier because I was getting
    different errors with extending mapper and reducer in java.

    I tried: hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file
    dummy.jar -mapper "java -cp dummy.jar WCMapper" -reducer "java -cp dummy.jar
    WCReducer" -input gutenberg/* -output gutenberg-output

    The error log in the map task:
    *stderr logs*

    Exception in thread "main" java.lang.NoClassDefFoundError: WCMapper
    Caused by: java.lang.ClassNotFoundException: WCMapper
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)


    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)


    Could not find the main class: WCMapper. Program will exit.
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)


    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:121)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)


    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)


    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1


    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)


    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)


    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)


    at org.apache.hadoop.mapred.Child.main(Child.java:253)

    ------------------------------



    Sincerely,
    Siddhartha Jonnalagadda,
    sjonnalagadda.wordpress.com


    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of the
    intended recipient(s) and may contain confidential and privileged
    information. Any unauthorized review, use, disclosure or distribution is
    prohibited. If you are not the intended recipient, please contact the sender
    by reply e-mail and destroy all copies of the original message.




    On Sun, Jun 5, 2011 at 10:59 AM, Marcos Ortiz Valmaseda wrote:

    Why are using Java in streming mode instead use the native Mapper/Reducer
    code?
    Can you show to us the JobTracker's logs?

    Regards
    ----- Mensaje original -----
    De: "Siddhartha Jonnalagadda" <sid.kgp@gmail.com>
    Para: mapreduce-user@hadoop.apache.org
    Enviados: Domingo, 5 de Junio 2011 7:16:08 GMT +01:00 Amsterdam / Berlín /
    Berna / Roma / Estocolmo / Viena
    Asunto: question about using java in streaming mode

    Hi,


    I was able use streaming in hadoop using python for the wordcount program,
    but created a Mapper and Reducer in Java since all my code is currently in
    Java.
    I first tried this:
    echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper |
    sort | java -cp ~/dummy.jar WCReducer

    It gave the correct output:
    labs 1
    foo 3
    bar 1
    quux 2

    Then, I installed a single-node cluster in hadoop and tried this: hadoop
    jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
    ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
    gutenberg/* -output gutenberg-output -file dummy.jar (by tailoring the
    python command)

    This is the error:
    hadoop@siddhartha-laptop:/usr/local/hadoop$ hadoop jar
    contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
    ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
    gutenberg/* -output gutenberg-output -file dummy.jar
    packageJobJar: [dummy.jar,
    /app/hadoop/tmp/hadoop-unjar5573454211442575176/] []
    /tmp/streamjob6721719460213928092.jar tmpDir=null
    11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to
    process : 3
    11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs():
    [/app/hadoop/tmp/mapred/local]
    11/06/04 20:47:15 INFO streaming.StreamJob: Running job:
    job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:47:15 INFO streaming.StreamJob:
    /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
    -kill job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL:
    http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
    11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
    11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:48:00 INFO streaming.StreamJob:
    /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
    -kill job_201106031901_0039
    11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL:
    http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA
    11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
    Streaming Job Failed!

    Any advice?
    Sincerely,
    Siddhartha Jonnalagadda,
    Text mining Researcher, Lnx Research, LLC, Orange, CA
    sjonnalagadda.wordpress.com







    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of the
    intended recipient(s) and may contain confidential and privileged
    information. Any unauthorized review, use, disclosure or distribution is
    prohibited. If you are not the intended recipient, please contact the sender
    by reply e-mail and destroy all copies of the original message.

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    http://marcosluis2186.posterous.com
  • Marcos Ortiz at Jun 5, 2011 at 9:51 pm

    El 6/5/2011 4:01 PM, Siddhartha Jonnalagadda escribió:
    Hi Marcos,

    I thought that streaming would make it easier because I was getting
    different errors with extending mapper and reducer in java.

    I tried: hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar
    -file dummy.jar -mapper "java -cp dummy.jar WCMapper" -reducer "java
    -cp dummy.jar WCReducer" -input gutenberg/* -output gutenberg-output

    The error log in the map task:
    *_stderr logs_*
    Exception in thread "main" java.lang.NoClassDefFoundError: WCMapper
    Caused by: java.lang.ClassNotFoundException: WCMapper
    Which is the definition of your ClassPath? Because, this error is caused
    where the system can not find
    the definition of a class.
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)


    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)


    Could not find the main class: WCMapper. Program will exit.
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)


    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:121)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)


    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)


    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1


    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)


    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)


    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)


    at org.apache.hadoop.mapred.Child.main(Child.java:253)

    ------------------------------------------------------------------------



    Sincerely,
    Siddhartha Jonnalagadda,
    sjonnalagadda.wordpress.com <http://sjonnalagadda.wordpress.com>


    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole use of
    the intended recipient(s) and may contain confidential and privileged
    information. Any unauthorized review, use, disclosure or distribution
    is prohibited. If you are not the intended recipient, please contact
    the sender by reply e-mail and destroy all copies of the original message.






    On Sun, Jun 5, 2011 at 10:59 AM, Marcos Ortiz Valmaseda
    wrote:

    Why are using Java in streming mode instead use the native
    Mapper/Reducer code?
    Can you show to us the JobTracker's logs?

    Regards
    ----- Mensaje original -----
    De: "Siddhartha Jonnalagadda" <sid.kgp@gmail.com
    Para: mapreduce-user@hadoop.apache.org
    Enviados: Domingo, 5 de Junio 2011 7:16:08 GMT +01:00 Amsterdam /
    Berlín / Berna / Roma / Estocolmo / Viena
    Asunto: question about using java in streaming mode

    Hi,


    I was able use streaming in hadoop using python for the wordcount
    program, but created a Mapper and Reducer in Java since all my
    code is currently in Java.
    I first tried this:
    echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar
    WCMapper | sort | java -cp ~/dummy.jar WCReducer

    It gave the correct output:
    labs 1
    foo 3
    bar 1
    quux 2

    Then, I installed a single-node cluster in hadoop and tried this:
    hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar
    -mapper “java -cp ~/dummy.jar WCMapper” -reducer “java -cp
    ~/dummy.jar WCReducer” -input gutenberg/* -output gutenberg-output
    -file dummy.jar (by tailoring the python command)

    This is the error:
    hadoop@siddhartha-laptop:/usr/local/hadoop$ hadoop jar
    contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java
    -cp ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar
    WCReducer” -input gutenberg/* -output gutenberg-output -file dummy.jar
    packageJobJar: [dummy.jar,
    /app/hadoop/tmp/hadoop-unjar5573454211442575176/] []
    /tmp/streamjob6721719460213928092.jar tmpDir=null
    11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths
    to process : 3
    11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs():
    [/app/hadoop/tmp/mapred/local]
    11/06/04 20:47:15 INFO streaming.StreamJob: Running job:
    job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:47:15 INFO streaming.StreamJob:
    /usr/local/hadoop/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:54311 -kill job_201106031901_0039
    11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL:
    http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
    11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
    11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
    11/06/04 20:48:00 INFO streaming.StreamJob:
    /usr/local/hadoop/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:54311 -kill job_201106031901_0039
    11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL:
    http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
    11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful.
    Error: NA
    11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
    Streaming Job Failed!

    Any advice?
    Sincerely,
    Siddhartha Jonnalagadda,
    Text mining Researcher, Lnx Research, LLC, Orange, CA
    sjonnalagadda.wordpress.com <http://sjonnalagadda.wordpress.com>







    Confidentiality Notice:

    This e-mail message, including any attachments, is for the sole
    use of the intended recipient(s) and may contain confidential and
    privileged information. Any unauthorized review, use, disclosure
    or distribution is prohibited. If you are not the intended
    recipient, please contact the sender by reply e-mail and destroy
    all copies of the original message.

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    http://marcosluis2186.posterous.com

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (UCI)
    http://marcosluis2186.posterous.com
    http://twitter.com/marcosluis2186

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJun 5, '11 at 5:17a
activeJun 5, '11 at 9:51p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase