FAQ
Hi,
At every beginning,I run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' successfully,but when run:
nutch crawl url -dir crawl -depth 3,got errors:
------------------------------------------------------------------------- -------------------------------------------------------------------------
10/08/07 22:53:30 INFO crawl.Crawl: crawl started in: crawl
.....................................................................
10/08/07 22:53:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Exception in thread "main" java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
.....................................................................
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
.....................................................................
... 9 more
Caused by: java.lang.IllegalArgumentException: Compression codec
org.apache.hadoop.io.compress.GzipCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
.....................................................................
... 14 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.GzipCodec
.....................................................................
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 16 more

------------------------------------------------------------------------- -------------------------------------------------------------------------
So,here GzipCode didn't get loaded successfully,or maybe it will not be loaded by default,I don't know,but I think it should be,then I followed this
link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ to install lzo and run:
"nutch crawl url -dir crawl -depth 3" again,got errors:
------------------------------------------------------------------------- -------------------------------------------------------------------------
10/08/07 22:40:41 INFO crawl.Crawl: crawl started in: crawl
.....................................................................
10/08/07 22:40:42 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
10/08/07 22:40:42 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Exception in thread "main" java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
.....................................................................
at org.apache.nutch.crawl.Injector.inject(Injector.java:211)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
.....................................................................
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.IllegalArgumentException: Compression codec
org.apache.hadoop.io.compress.GzipCodec not found.
.....................................................................
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41)
... 14 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.GzipCodec
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
.....................................................................
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 16 more

------------------------------------------------------------------------- -------------------------------------------------------------------------
run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+',got errors:
------------------------------------------------------------------------- -------------------------------------------------------------------------
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:400)
.....................................................................
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
.....................................................................
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 22 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec
not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
at org.apache.hadoop.io.compress.CompressionCodecFactory.(TextInputFormat.java:41)
... 27 more
Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
.....................................................................
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 29 more

------------------------------------------------------------------------- -------------------------------------------------------------------------
when I run " ps -aef|grep gpl",got output:
------------------------------------------------------------------------- -------------------------------------------------------------------------

alex 2267 1 1 22:04 pts/1 00:00:04 /usr/local/hadoop/jdk1.6.0_21/bin/java -Xmx200m -Dcom.sun.management.jmxremote
-..............................................
/usr/local/hadoop/hadoop-0.20.2/bin/../conf:/usr/local/hadoop/jdk1.6.0_21/lib/tools.jar:/usr/local/hadoop/hadoop-0.20.2/bin/..:/usr/local/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-
core.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-cli-1.2.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-
codec-1.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-el-1.0.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-.-
net-1.4.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/core-3.1.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hadoop-gpl-compression-0.2.0-
dev.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper-
compiler-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper-
runtime-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jets3t-0.6.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty-6.1.14.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty-
.......................................-
log4j12-1.4.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/xmlenc-0.52.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-
api-2.1.jar org.apache.hadoop.hdfs.server.namenode.NameNode

See,two jars(hadoop-core and pgl) are there,but seems that they can't be referenced by job,before this, I have tried to install hadoop-
lzo(http://github.com/kevinweil/hadoop-lzo),same errors,maybe hadoop-lzo only works for hadoop 0.20,not for 0.20.1/2,I don't know.After one month,I
haven't solved this problem,it's killing me,here I post all configure files,would you please help me dig problem out?Thank you.

core-site.xml
------------------------------------------------------------------------- -------------------------------------------------------------------------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://AlexLuya:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/alex/tmp</value>
</property>

<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>

------------------------------------------------------------------------- -------------------------------------------------------------------------
mapreduce.xml
------------------------------------------------------------------------- -------------------------------------------------------------------------
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>AlexLuya:9001</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/alex/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/tmp/hadoop/mapred/system</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
------------------------------------------------------------------------- -------------------------------------------------------------------------
hadoop-env.sh
------------------------------------------------------------------------- -------------------------------------------------------------------------
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use. Required.
export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_21

# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=200

# Extra Java runtime options. Empty by default.
#export HADOOP_OPTS=-server

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS

# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1

# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

# A string representing this instance of hadoop. $USER by default.
#export HADOOP_IDENT_STRING=$USER

# The scheduling priority for daemon processes. See 'man nice'.
# export HADOOP_NICENESS=10

------------------------------------------------------------------------- -------------------------------------------------------------------------

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 15, '10 at 12:53p
activeAug 15, '10 at 12:53p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Alex Luya: 1 post

People

Translate

site design / logo © 2023 Grokbase