FAQ
hi
we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
When we came 4th segment parse it gave java.lang.OutOfMemoryError:
Requested array size exceeds VM limit error each machine.. Our segment size
crawled/segments/20071002163239 3472754178
i try several map reduce configurations nothing change.. (400-50 ; 300-15
;50-15 ; 100-15; 200-35)
i also set heap size in hadoop-env and nutch script to 2000M



--
View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError%3A-Requested-array-size-exceeds-VM-limit-tf4562352.html#a13020775
Sent from the Hadoop Users mailing list archive at Nabble.com.

Search Discussions

  • Konstantin Shvachko at Oct 3, 2007 at 6:11 pm
    Hi
    Could you also send a call stack. It is not clear which component is out
    of memory.
    If it is the name-node, then you should check how many files, dirs, and
    blocks there is by the time of failure.
    If your crawl generates a lot of small files that could be the case.
    Let us know.
    --Konstantin


    Uygar BAYAR wrote:
    hi
    we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
    use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
    When we came 4th segment parse it gave java.lang.OutOfMemoryError:
    Requested array size exceeds VM limit error each machine.. Our segment size
    crawled/segments/20071002163239 3472754178
    i try several map reduce configurations nothing change.. (400-50 ; 300-15
    ;50-15 ; 100-15; 200-35)
    i also set heap size in hadoop-env and nutch script to 2000M



  • Uygar BAYAR at Oct 4, 2007 at 6:42 am
    hi
    It's not a namenode, there is a single segment. Before parsing part fetch
    reduce by 10 factor.
    here is call stack and files to be parse sorry for long log

    /user/nutch/sirketce/crawled/segments/20071002163239/content <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00000
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00000/data
    <r 3> 334429747
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00000/index
    <r 3> 14916
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00001
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00001/data
    <r 3> 327920464
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00001/index
    <r 3> 14930
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00002
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00002/data
    <r 3> 329962280
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00002/index
    <r 3> 14980
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00003
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00003/data
    <r 3> 328364139
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00003/index
    <r 3> 14724
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00004
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00004/data
    <r 3> 327625845
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00004/index
    <r 3> 14762
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00005
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00005/data
    <r 3> 328455639
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00005/index
    <r 3> 14889
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00006
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00006/data
    <r 3> 331291187
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00006/index
    <r 3> 14660
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00007
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00007/data
    <r 3> 323871321
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00007/index
    <r 3> 14681
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00008
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00008/data
    <r 3> 327993727
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00008/index
    <r 3> 14898
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00009
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00009/data
    <r 3> 323695463
    /user/nutch/sirketce/crawled/segments/20071002163239/content/part-00009/index
    <r 3> 14656
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00000
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00000/data
    <r 3> 8797532
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00000/index
    <r 3> 14508
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00001
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00001/data
    <r 3> 8759847
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00001/index
    <r 3> 14527
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00002
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00002/data
    <r 3> 8766600
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00002/index
    <r 3> 14583
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00003
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00003/data
    <r 3> 8787659
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00003/index
    <r 3> 14313
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00004
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00004/data
    <r 3> 8740838
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00004/index
    <r 3> 14352
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00005
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00005/data
    <r 3> 8736991
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00005/index
    <r 3> 14476
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00006
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00006/data
    <r 3> 8672715
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00006/index
    <r 3> 14265
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00007
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00007/data
    <r 3> 8695395
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00007/index
    <r 3> 14301
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00008
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00008/data
    <r 3> 8737508
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00008/index
    <r 3> 14483
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00009
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00009/data
    <r 3> 8705316
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_fetch/part-00009/index
    <r 3> 14243
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate
    <dir>
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00000
    <r 3> 8396806
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00001
    <r 3> 14459204
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00002
    <r 3> 6889290
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00003
    <r 3> 5811612
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00004
    <r 3> 7906811
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00005
    <r 3> 6508687
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00006
    <r 3> 6424363
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00007
    <r 3> 5835119
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00008
    <r 3> 6605622
    /user/nutch/sirketce/crawled/segments/20071002163239/crawl_generate/part-00009
    <r 3> 5693232


    task_0002_m_000070_0: log4j:ERROR setFile(null,true) call failed.
    task_0002_m_000070_0: java.io.FileNotFoundException:
    /home/nutch/crawler1/logs (Is a directory)
    task_0002_m_000070_0: at java.io.FileOutputStream.openAppend(Native
    Method)
    task_0002_m_000070_0: at
    java.io.FileOutputStream.<init>(FileOutputStream.java:177)
    task_0002_m_000070_0: at
    java.io.FileOutputStream.<init>(FileOutputStream.java:102)
    task_0002_m_000070_0: at
    org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
    task_0002_m_000070_0: at
    org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
    task_0002_m_000070_0: at
    org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
    task_0002_m_000070_0: at
    org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
    task_0002_m_000070_0: at
    org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
    task_0002_m_000070_0: at
    org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
    task_0002_m_000070_0: at
    org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
    task_0002_m_000070_0: at
    org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
    task_0002_m_000070_0: at
    org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
    task_0002_m_000070_0: at
    org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
    task_0002_m_000070_0: at
    org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
    task_0002_m_000070_0: at
    org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
    task_0002_m_000070_0: at
    org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
    task_0002_m_000070_0: at
    org.apache.log4j.Logger.getLogger(Logger.java:104)
    task_0002_m_000070_0: at
    org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
    task_0002_m_000070_0: at
    org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
    task_0002_m_000070_0: at
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    task_0002_m_000070_0: at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    task_0002_m_000070_0: at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    task_0002_m_000070_0: at
    java.lang.reflect.Constructor.newInstance(Constructor.java:494)
    task_0002_m_000070_0: at
    org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
    task_0002_m_000070_0: at
    org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
    task_0002_m_000070_0: at
    org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
    task_0002_m_000070_0: at
    org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:84)
    task_0002_m_000070_0: at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1685)
    task_0002_m_000070_0: log4j:ERROR Either File or DatePattern options are not
    set for appender [DRFA].
    task_0002_m_000070_1: log4j:ERROR setFile(null,true) call failed.
    task_0002_m_000070_1: java.io.FileNotFoundException:
    /home/nutch/crawler1/logs (Is a directory)
    task_0002_m_000070_1: at java.io.FileOutputStream.openAppend(Native
    Method)
    task_0002_m_000070_1: at
    java.io.FileOutputStream.<init>(FileOutputStream.java:177)
    task_0002_m_000070_1: at
    java.io.FileOutputStream.<init>(FileOutputStream.java:102)
    task_0002_m_000070_1: at
    org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
    task_0002_m_000070_1: at
    org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
    task_0002_m_000070_1: at
    org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
    task_0002_m_000070_1: at
    org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
    task_0002_m_000070_1: at
    org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
    task_0002_m_000070_1: at
    org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
    task_0002_m_000070_1: at
    org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
    task_0002_m_000070_1: at
    org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
    task_0002_m_000070_1: at
    org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
    task_0002_m_000070_1: at
    org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
    task_0002_m_000070_1: at
    org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
    task_0002_m_000070_1: at
    org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
    task_0002_m_000070_1: at
    org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
    task_0002_m_000070_1: at
    org.apache.log4j.Logger.getLogger(Logger.java:104)
    task_0002_m_000070_1: at
    org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
    task_0002_m_000070_1: at
    org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
    task_0002_m_000070_1: at
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    task_0002_m_000070_1:
    atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    task_0002_m_000070_1: at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    task_0002_m_000070_1: at
    java.lang.reflect.Constructor.newInstance(Constructor.java:494)
    task_0002_m_000070_1: at
    org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
    task_0002_m_000070_1: at
    org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
    task_0002_m_000070_1: at
    org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
    task_0002_m_000070_1: at
    org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:84)
    task_0002_m_000070_1: at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1685)
    task_0002_m_000070_1: log4j:ERROR Either File or DatePattern options are not
    set for appender [DRFA].
    task_0002_m_000070_2: log4j:ERROR setFile(null,true) call failed.
    task_0002_m_000070_2: java.io.FileNotFoundException:
    /home/nutch/crawler1/logs (Is a directory)
    task_0002_m_000070_2: at java.io.FileOutputStream.openAppend(Native
    Method)
    task_0002_m_000070_2: at
    java.io.FileOutputStream.<init>(FileOutputStream.java:177)
    task_0002_m_000070_2: at
    java.io.FileOutputStream.<init>(FileOutputStream.java:102)
    task_0002_m_000070_2: at
    org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
    task_0002_m_000070_2: at
    org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
    task_0002_m_000070_2: at
    org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
    task_0002_m_000070_2: at
    org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
    task_0002_m_000070_2: at
    org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
    task_0002_m_000070_2: at
    org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
    task_0002_m_000070_2: at
    org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
    task_0002_m_000070_2: at
    org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
    task_0002_m_000070_2: at
    org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
    task_0002_m_000070_2: at
    org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
    task_0002_m_000070_2: at
    org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
    task_0002_m_000070_2: at
    org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
    task_0002_m_000070_2: at
    org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
    task_0002_m_000070_2: at
    org.apache.log4j.Logger.getLogger(Logger.java:104)
    task_0002_m_000070_2: at
    org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
    task_0002_m_000070_2: at
    org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
    task_0002_m_000070_2: at
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    task_0002_m_000070_2: at
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    task_0002_m_000070_2: at
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    task_0002_m_000070_2: at
    java.lang.reflect.Constructor.newInstance(Constructor.java:494)
    task_0002_m_000070_2: at
    org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
    task_0002_m_000070_2: at
    org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
    task_0002_m_000070_2: at
    org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
    task_0002_m_000070_2: at
    org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:84)
    task_0002_m_000070_2: at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1685)
    task_0002_m_000070_2: log4j:ERROR Either File or DatePattern options are not
    set for appender [DRFA].
    Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:131)
    at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:149)


    Konstantin Shvachko wrote:
    Hi
    Could you also send a call stack. It is not clear which component is out
    of memory.
    If it is the name-node, then you should check how many files, dirs, and
    blocks there is by the time of failure.
    If your crawl generates a lot of small files that could be the case.
    Let us know.
    --Konstantin


    Uygar BAYAR wrote:
    hi
    we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
    use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
    When we came 4th segment parse it gave java.lang.OutOfMemoryError:
    Requested array size exceeds VM limit error each machine.. Our segment
    size
    crawled/segments/20071002163239 3472754178
    i try several map reduce configurations nothing change.. (400-50 ; 300-15
    ;50-15 ; 100-15; 200-35)
    i also set heap size in hadoop-env and nutch script to 2000M



    --
    View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError%3A-Requested-array-size-exceeds-VM-limit-tf4562352.html#a13033518
    Sent from the Hadoop Users mailing list archive at Nabble.com.
  • Uygar BAYAR at Oct 4, 2007 at 2:29 pm
    hi
    i found problem.. In the nutch-site.xml it parsed almost everything becouse
    of this memory vm limit exceeds..



    Uygar BAYAR wrote:
    hi
    we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
    use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
    When we came 4th segment parse it gave java.lang.OutOfMemoryError:
    Requested array size exceeds VM limit error each machine.. Our segment
    size
    crawled/segments/20071002163239 3472754178
    i try several map reduce configurations nothing change.. (400-50 ; 300-15
    ;50-15 ; 100-15; 200-35)
    i also set heap size in hadoop-env and nutch script to 2000M


    --
    View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError%3A-Requested-array-size-exceeds-VM-limit-tf4562352.html#a13040990
    Sent from the Hadoop Users mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 3, '07 at 3:04p
activeOct 4, '07 at 2:29p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase