hi
It's not a namenode, there is a single segment. Before parsing part fetch
reduce by 10 factor.
here is call stack and files to be parse sorry for long log
/user/nutch/sirketce/crawled/segments/20071002163239/content <dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00000
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00000/data
<r 3> 334429747
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00000/index
<r 3> 14916
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00001
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00001/data
<r 3> 327920464
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00001/index
<r 3> 14930
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00002
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00002/data
<r 3> 329962280
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00002/index
<r 3> 14980
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00003
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00003/data
<r 3> 328364139
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00003/index
<r 3> 14724
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00004
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00004/data
<r 3> 327625845
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00004/index
<r 3> 14762
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00005
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00005/data
<r 3> 328455639
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00005/index
<r 3> 14889
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00006
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00006/data
<r 3> 331291187
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00006/index
<r 3> 14660
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00007
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00007/data
<r 3> 323871321
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00007/index
<r 3> 14681
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00008
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00008/data
<r 3> 327993727
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00008/index
<r 3> 14898
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00009
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00009/data
<r 3> 323695463
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-00009/index
<r 3> 14656
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00000
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00000/data
<r 3> 8797532
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00000/index
<r 3> 14508
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00001
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00001/data
<r 3> 8759847
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00001/index
<r 3> 14527
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00002
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00002/data
<r 3> 8766600
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00002/index
<r 3> 14583
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00003
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00003/data
<r 3> 8787659
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00003/index
<r 3> 14313
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00004
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00004/data
<r 3> 8740838
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00004/index
<r 3> 14352
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00005
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00005/data
<r 3> 8736991
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00005/index
<r 3> 14476
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00006
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00006/data
<r 3> 8672715
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00006/index
<r 3> 14265
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00007
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00007/data
<r 3> 8695395
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00007/index
<r 3> 14301
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00008
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00008/data
<r 3> 8737508
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00008/index
<r 3> 14483
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00009
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00009/data
<r 3> 8705316
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00009/index
<r 3> 14243
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate
<dir>
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00000
<r 3> 8396806
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00001
<r 3> 14459204
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00002
<r 3> 6889290
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00003
<r 3> 5811612
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00004
<r 3> 7906811
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00005
<r 3> 6508687
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00006
<r 3> 6424363
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00007
<r 3> 5835119
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00008
<r 3> 6605622
/user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00009
<r 3> 5693232
task_0002_m_000070_0: log4j:ERROR setFile(null,true) call failed.
task_0002_m_000070_0: java.io.FileNotFoundException:
/home/nutch/crawler1/logs (Is a directory)
task_0002_m_000070_0: at java.io.FileOutputStream.openAppend(Native
Method)
task_0002_m_000070_0: at
java.io.FileOutputStream.<init>(FileOutputStream.java:177)
task_0002_m_000070_0: at
java.io.FileOutputStream.<init>(FileOutputStream.java:102)
task_0002_m_000070_0: at
org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
task_0002_m_000070_0: at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
task_0002_m_000070_0: at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
task_0002_m_000070_0: at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
task_0002_m_000070_0: at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
task_0002_m_000070_0: at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
task_0002_m_000070_0: at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
task_0002_m_000070_0: at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
task_0002_m_000070_0: at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
task_0002_m_000070_0: at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
task_0002_m_000070_0: at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
task_0002_m_000070_0: at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
task_0002_m_000070_0: at
org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
task_0002_m_000070_0: at
org.apache.log4j.Logger.getLogger(Logger.java:104)
task_0002_m_000070_0: at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
task_0002_m_000070_0: at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
task_0002_m_000070_0: at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
task_0002_m_000070_0: at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
task_0002_m_000070_0: at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
task_0002_m_000070_0: at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
task_0002_m_000070_0: at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
task_0002_m_000070_0: at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
task_0002_m_000070_0: at
org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
task_0002_m_000070_0: at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:84)
task_0002_m_000070_0: at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1685)
task_0002_m_000070_0: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
task_0002_m_000070_1: log4j:ERROR setFile(null,true) call failed.
task_0002_m_000070_1: java.io.FileNotFoundException:
/home/nutch/crawler1/logs (Is a directory)
task_0002_m_000070_1: at java.io.FileOutputStream.openAppend(Native
Method)
task_0002_m_000070_1: at
java.io.FileOutputStream.<init>(FileOutputStream.java:177)
task_0002_m_000070_1: at
java.io.FileOutputStream.<init>(FileOutputStream.java:102)
task_0002_m_000070_1: at
org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
task_0002_m_000070_1: at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
task_0002_m_000070_1: at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
task_0002_m_000070_1: at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
task_0002_m_000070_1: at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
task_0002_m_000070_1: at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
task_0002_m_000070_1: at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
task_0002_m_000070_1: at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
task_0002_m_000070_1: at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
task_0002_m_000070_1: at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
task_0002_m_000070_1: at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
task_0002_m_000070_1: at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
task_0002_m_000070_1: at
org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
task_0002_m_000070_1: at
org.apache.log4j.Logger.getLogger(Logger.java:104)
task_0002_m_000070_1: at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
task_0002_m_000070_1: at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
task_0002_m_000070_1: at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
task_0002_m_000070_1:
atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
task_0002_m_000070_1: at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
task_0002_m_000070_1: at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
task_0002_m_000070_1: at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
task_0002_m_000070_1: at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
task_0002_m_000070_1: at
org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
task_0002_m_000070_1: at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:84)
task_0002_m_000070_1: at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1685)
task_0002_m_000070_1: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
task_0002_m_000070_2: log4j:ERROR setFile(null,true) call failed.
task_0002_m_000070_2: java.io.FileNotFoundException:
/home/nutch/crawler1/logs (Is a directory)
task_0002_m_000070_2: at java.io.FileOutputStream.openAppend(Native
Method)
task_0002_m_000070_2: at
java.io.FileOutputStream.<init>(FileOutputStream.java:177)
task_0002_m_000070_2: at
java.io.FileOutputStream.<init>(FileOutputStream.java:102)
task_0002_m_000070_2: at
org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
task_0002_m_000070_2: at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
task_0002_m_000070_2: at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
task_0002_m_000070_2: at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
task_0002_m_000070_2: at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
task_0002_m_000070_2: at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
task_0002_m_000070_2: at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
task_0002_m_000070_2: at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
task_0002_m_000070_2: at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
task_0002_m_000070_2: at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
task_0002_m_000070_2: at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
task_0002_m_000070_2: at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
task_0002_m_000070_2: at
org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
task_0002_m_000070_2: at
org.apache.log4j.Logger.getLogger(Logger.java:104)
task_0002_m_000070_2: at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
task_0002_m_000070_2: at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
task_0002_m_000070_2: at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
task_0002_m_000070_2: at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
task_0002_m_000070_2: at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
task_0002_m_000070_2: at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
task_0002_m_000070_2: at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
task_0002_m_000070_2: at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
task_0002_m_000070_2: at
org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
task_0002_m_000070_2: at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:84)
task_0002_m_000070_2: at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1685)
task_0002_m_000070_2: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:131)
at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:149)
Konstantin Shvachko wrote:
Hi
Could you also send a call stack. It is not clear which component is out
of memory.
If it is the name-node, then you should check how many files, dirs, and
blocks there is by the time of failure.
If your crawl generates a lot of small files that could be the case.
Let us know.
--Konstantin
Uygar BAYAR wrote:
hi
we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
When we came 4th segment parse it gave java.lang.OutOfMemoryError:
Requested array size exceeds VM limit error each machine.. Our segment
size