FAQ
Hey guys,

I am running hive and I am trying to join two tables (2.2GB and 136MB) on a
cluster of 9 nodes (replication = 3)

Hadoop version - 0.20.2
Each data node memory - 2GB
HADOOP_HEAPSIZE - 1000MB

other heap settings are defaults. My hive launches 40 Maptasks and every
task failed with the same error

2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 300
2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


Looks like I need to tweak some of the heap settings for TTs to handle
the memory efficiently. I am unable to understand which variables to
modify (there are too many related to heap sizes).

Any specific things I must look at?

Thanks,

jS

Search Discussions

  • Mapred Learn at Sep 19, 2011 at 12:58 pm
    What is mapred.child.java.opts set to in your server (TAsk trackers ) configuration ?
    You need to set this to a bigger value like 1 gig or so..

    Sent from my iPhone
    On Sep 19, 2011, at 5:43 AM, john smith wrote:

    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and 136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and every
    task failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
    Error running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs to handle
    the memory efficiently. I am unable to understand which variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS
  • Uma Maheswara Rao G 72686 at Sep 19, 2011 at 12:58 pm
    Hello,

    You need configure heap size for child tasks using below proprty.
    "mapred.child.java.opts" in mapred-site.xml

    by default it will be 200mb. But your io.sort.mb(300) is more than that.
    So, configure more heap space for child tasks.

    ex:
    -Xmx512m

    Regards,
    Uma

    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 6:14 pm
    Subject: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and
    136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and
    everytask failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
    io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
    Error running child : java.lang.OutOfMemoryError: Java heap space
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs to handle
    the memory efficiently. I am unable to understand which variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS
  • John smith at Sep 19, 2011 at 1:32 pm
    Hi all,

    Thanks for the inputs...

    Can I reduce the io.sort.mb ? (owing to the fact that I have less ram size ,
    2GB)

    My conf files doesn't have an entry mapred.child.java.opts .. So I guess its
    taking a default value of 200MB.

    Also how to decide the number of tasks per TT ? I have 4 cores per node and
    2GB of total memory . So how many per node maximum tasks should I set?

    Thanks
    On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 wrote:

    Hello,

    You need configure heap size for child tasks using below proprty.
    "mapred.child.java.opts" in mapred-site.xml

    by default it will be 200mb. But your io.sort.mb(300) is more than that.
    So, configure more heap space for child tasks.

    ex:
    -Xmx512m

    Regards,
    Uma

    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 6:14 pm
    Subject: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and
    136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and
    everytask failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
    io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
    Error running child : java.lang.OutOfMemoryError: Java heap space
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs to handle
    the memory efficiently. I am unable to understand which variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS
  • Bejoy Hadoop at Sep 19, 2011 at 1:38 pm
    John
    Can you share the hive QL you are using for joins?

    Regards
    Bejoy K S

    -----Original Message-----
    From: john smith <js1987.smith@gmail.com>
    Date: Mon, 19 Sep 2011 19:02:02
    To: <common-user@hadoop.apache.org>
    Reply-To: common-user@hadoop.apache.org
    Subject: Re: Out of heap space errors on TTs

    Hi all,

    Thanks for the inputs...

    Can I reduce the io.sort.mb ? (owing to the fact that I have less ram size ,
    2GB)

    My conf files doesn't have an entry mapred.child.java.opts .. So I guess its
    taking a default value of 200MB.

    Also how to decide the number of tasks per TT ? I have 4 cores per node and
    2GB of total memory . So how many per node maximum tasks should I set?

    Thanks
    On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 wrote:

    Hello,

    You need configure heap size for child tasks using below proprty.
    "mapred.child.java.opts" in mapred-site.xml

    by default it will be 200mb. But your io.sort.mb(300) is more than that.
    So, configure more heap space for child tasks.

    ex:
    -Xmx512m

    Regards,
    Uma

    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 6:14 pm
    Subject: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and
    136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and
    everytask failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
    io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
    Error running child : java.lang.OutOfMemoryError: Java heap space
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs to handle
    the memory efficiently. I am unable to understand which variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS
  • John smith at Sep 19, 2011 at 1:43 pm
    Hi,

    Its a simple join,

    select count(*) from customer JOIN supplier ON (customer.c_nationkey =
    supplier.s_nationkey);

    customer(2.2GB) and supplier (137MB) are TPCH tables generated.

    Total of 40 Maps tasks are getting generated for this query.

    Thanks
    On Mon, Sep 19, 2011 at 7:08 PM, wrote:

    John
    Can you share the hive QL you are using for joins?

    Regards
    Bejoy K S

    -----Original Message-----
    From: john smith <js1987.smith@gmail.com>
    Date: Mon, 19 Sep 2011 19:02:02
    To: <common-user@hadoop.apache.org>
    Reply-To: common-user@hadoop.apache.org
    Subject: Re: Out of heap space errors on TTs

    Hi all,

    Thanks for the inputs...

    Can I reduce the io.sort.mb ? (owing to the fact that I have less ram size
    ,
    2GB)

    My conf files doesn't have an entry mapred.child.java.opts .. So I guess
    its
    taking a default value of 200MB.

    Also how to decide the number of tasks per TT ? I have 4 cores per node and
    2GB of total memory . So how many per node maximum tasks should I set?

    Thanks

    On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    Hello,

    You need configure heap size for child tasks using below proprty.
    "mapred.child.java.opts" in mapred-site.xml

    by default it will be 200mb. But your io.sort.mb(300) is more than that.
    So, configure more heap space for child tasks.

    ex:
    -Xmx512m

    Regards,
    Uma

    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 6:14 pm
    Subject: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and
    136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and
    everytask failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
    io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL org.apache.hadoop.mapred.TaskTracker:
    Error running child : java.lang.OutOfMemoryError: Java heap space
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs to handle
    the memory efficiently. I am unable to understand which variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS
  • Uma Maheswara Rao G 72686 at Sep 19, 2011 at 1:46 pm
    Hello John

    You can use below properties
    mapred.tasktracker.map.tasks.maximum
    mapred.tasktracker.reduce.tasks.maximum
    By default that values will be 10.

    AFAIK, you can reduce io.sort.mb. But disk usage will be high.

    Since this is related to mapred, I have moved this discussion to Mapreduce. and cc'ed to common.


    Regards,
    Uma


    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 7:02 pm
    Subject: Re: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hi all,

    Thanks for the inputs...

    Can I reduce the
    ? (owing to the fact that I have less
    ram size ,
    2GB)

    My conf files doesn't have an entry mapred.child.java.opts .. So I
    guess its
    taking a default value of 200MB.

    Also how to decide the number of tasks per TT ? I have 4 cores per
    node and
    2GB of total memory . So how many per node maximum tasks should I set?

    Thanks

    On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    Hello,

    You need configure heap size for child tasks using below proprty.
    "mapred.child.java.opts" in mapred-site.xml

    by default it will be 200mb. But your io.sort.mb(300) is more
    than that.
    So, configure more heap space for child tasks.

    ex:
    -Xmx512m

    Regards,
    Uma

    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 6:14 pm
    Subject: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and
    136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and
    everytask failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
    io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL
    org.apache.hadoop.mapred.TaskTracker:> > Error running child :
    java.lang.OutOfMemoryError: Java heap space
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs
    to handle
    the memory efficiently. I am unable to understand which
    variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS
  • Bejoy KS at Sep 19, 2011 at 2:40 pm
    John,
    Did you try out map join with hive? It uses the Distributed Cache and
    hash maps to achieve the goal.
    set hive.auto.convert.join = true;
    I have* *tried the same over joins involving huge tables and a few smaller
    tables.My smaller tables where less than 25MB(configuration tables) and It
    worked for me. In your case since the smaller table is 137MB I'm not sure
    whether you should go in for this or not. Let us leave that part for the
    experts to comment on.
    Also map joins in default would work only if the size of smaller table is
    less than 25Mb. You can try increasing the value that would suit your
    requirements by.
    *set hive.smalltable.filesize = 150000000*.
    I'm really not sure whether it is advisable in your scenario. I'm leaving it
    to the experts to comment on the same.

    All,
    A quick query from my end. What could be the maximum size of a
    file that could be distributed on the cache in map reduce jobs? I 'm looking
    out for an optimal value along with the maximum permissible one(not
    impacting the execution of basic map reduce ). Does that depend on your
    cluster size or one your individual node hardware configuration?

    On Mon, Sep 19, 2011 at 7:15 PM, Uma Maheswara Rao G 72686 wrote:

    Hello John

    You can use below properties
    mapred.tasktracker.map.tasks.maximum
    mapred.tasktracker.reduce.tasks.maximum
    By default that values will be 10.

    AFAIK, you can reduce io.sort.mb. But disk usage will be high.

    Since this is related to mapred, I have moved this discussion to Mapreduce.
    and cc'ed to common.


    Regards,
    Uma


    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 7:02 pm
    Subject: Re: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hi all,

    Thanks for the inputs...

    Can I reduce the
    ? (owing to the fact that I have less
    ram size ,
    2GB)

    My conf files doesn't have an entry mapred.child.java.opts .. So I
    guess its
    taking a default value of 200MB.

    Also how to decide the number of tasks per TT ? I have 4 cores per
    node and
    2GB of total memory . So how many per node maximum tasks should I set?

    Thanks

    On Mon, Sep 19, 2011 at 6:28 PM, Uma Maheswara Rao G 72686 <
    maheswara@huawei.com> wrote:
    Hello,

    You need configure heap size for child tasks using below proprty.
    "mapred.child.java.opts" in mapred-site.xml

    by default it will be 200mb. But your io.sort.mb(300) is more
    than that.
    So, configure more heap space for child tasks.

    ex:
    -Xmx512m

    Regards,
    Uma

    ----- Original Message -----
    From: john smith <js1987.smith@gmail.com>
    Date: Monday, September 19, 2011 6:14 pm
    Subject: Out of heap space errors on TTs
    To: common-user@hadoop.apache.org
    Hey guys,

    I am running hive and I am trying to join two tables (2.2GB and
    136MB) on a
    cluster of 9 nodes (replication = 3)

    Hadoop version - 0.20.2
    Each data node memory - 2GB
    HADOOP_HEAPSIZE - 1000MB

    other heap settings are defaults. My hive launches 40 Maptasks and
    everytask failed with the same error

    2011-09-19 18:37:17,110 INFO org.apache.hadoop.mapred.MapTask:
    io.sort.mb = 300
    2011-09-19 18:37:17,223 FATAL
    org.apache.hadoop.mapred.TaskTracker:> > Error running child :
    java.lang.OutOfMemoryError: Java heap space
    at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)>
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


    Looks like I need to tweak some of the heap settings for TTs
    to handle
    the memory efficiently. I am unable to understand which
    variables to
    modify (there are too many related to heap sizes).

    Any specific things I must look at?

    Thanks,

    jS

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 19, '11 at 12:43p
activeSep 19, '11 at 2:40p
posts8
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase