FAQ
Hi,

I tried to update my db, using the following command:
bin/nutch updatedb crawld/crawldb crawld/segments/20070628095836

and my 2 nodes had an error and i can see the following exception:
2007-06-30 12:24:29,688 INFO mapred.TaskInProgress - Error from
task_0001_m_000000_1: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.io.Text.write(Text.java:243)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
MapTask.java:316)
at org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:99)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:1445)


My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
What is the best practice ?

Do you any idea if they are a bug ? or is it just my conf which is not
correct ?

Thanks for your help

Search Discussions

  • Avinash Lakshman at Jun 30, 2007 at 6:51 pm
    There is an element in the config for Java params. Set it to -Xms1024M
    and give it a shot. It is definitely seems like a case of you running
    out of heap space.

    A
    -----Original Message-----
    From: Emmanuel JOKE
    Sent: Saturday, June 30, 2007 10:32 AM
    To: hadoop-user
    Subject: OutOfMemory

    Hi,

    I tried to update my db, using the following command:
    bin/nutch updatedb crawld/crawldb crawld/segments/20070628095836

    and my 2 nodes had an error and i can see the following exception:
    2007-06-30 12:24:29,688 INFO mapred.TaskInProgress - Error from
    task_0001_m_000000_1: java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at
    java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
    :94)
    at java.io.DataOutputStream.write(DataOutputStream.java:90)
    at org.apache.hadoop.io.Text.write(Text.java:243)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
    MapTask.java:316)
    at
    org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:99)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
    :1445)


    My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
    What is the best practice ?

    Do you any idea if they are a bug ? or is it just my conf which is not
    correct ?

    Thanks for your help
  • Ted Dunning at Jul 1, 2007 at 9:11 pm
    If you are using machines with only 512MB of memory, it is probably a very
    bad idea to set minimum help size so large.

    -Xms400M might be more appropriate.

    I should say, though that if you have a program that is worth using hadoop
    on, you have a problem that is worth having more memory on each processor.
    Most of the work I do benefits more from memory than from processor, at
    least up to >1-2GB RAM.
    On 6/30/07 11:51 AM, "Avinash Lakshman" wrote:

    There is an element in the config for Java params. Set it to -Xms1024M
    and give it a shot. It is definitely seems like a case of you running
    out of heap space.

    A
    -----Original Message-----
    From: Emmanuel JOKE
    ...
    My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
    What is the best practice ?

    Do you any idea if they are a bug ? or is it just my conf which is not
    correct ?

    Thanks for your help
  • Emmanuel JOKE at Jul 2, 2007 at 12:11 pm
    Thanks for your advice but i finally fixed my pb by increasing the number of
    map to 200 as described in the tutorial:
    http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial
    ==> "I noticed that the number of map and reduce task has an impact on the
    performance of Hadoop. Many times after crawling a lot of pages the nodes
    reported 'java.lang.OutOfMemoryError<http://wiki.apache.org/nutch/OutOfMemoryError>:
    Java heap space' errors, this happened also in the indexing part. Increasing
    the number of maps solved these problems, with an index that has over
    200.000 pages I needed 306 maps in total over 3 machines. By setting the
    mapred.maps.tasks property in hadoop-site.xml to 99 (much higher than what
    is advised in other tutorials and in the hadoop-site.xml file) that problem
    is solved. *"*

    Beside, there is something i don't understand. The default configuration is
    mapred.tasktracker.tasks.maximum=2 and
    mapred.child.java.opts = -Xmx200m, so increasing the total memory from 512
    to 1024 Mo wont change anything, isn't it ?

    Anyway, thanks for your help.
    If you are using machines with only 512MB of memory, it is probably a very
    bad idea to set minimum help size so large.

    -Xms400M might be more appropriate.

    I should say, though that if you have a program that is worth using hadoop
    on, you have a problem that is worth having more memory on each processor.
    Most of the work I do benefits more from memory than from processor, at
    least up to >1-2GB RAM.
    On 6/30/07 11:51 AM, "Avinash Lakshman" wrote:

    There is an element in the config for Java params. Set it to -Xms1024M
    and give it a shot. It is definitely seems like a case of you running
    out of heap space.

    A
    -----Original Message-----
    From: Emmanuel JOKE
    ...
    My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
    What is the best practice ?

    Do you any idea if they are a bug ? or is it just my conf which is not
    correct ?

    Thanks for your help
  • Ted Dunning at Jul 2, 2007 at 3:01 pm
    I think that you are saying that 2 x 200 is < 1024 so the system won't use
    all of your memory.

    You are correct that you would have to change these options to make full use
    of your memory. But nothing is stopping you from changing them either by
    increasing the number of tasks or the amount of memory.

    On 7/2/07 5:11 AM, "Emmanuel JOKE" wrote:

    Beside, there is something i don't understand. The default configuration is
    mapred.tasktracker.tasks.maximum=2 and
    mapred.child.java.opts = -Xmx200m, so increasing the total memory from 512
    to 1024 Mo wont change anything, isn't it ?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 30, '07 at 5:32p
activeJul 2, '07 at 3:01p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase