Grokbase Groups Hive dev April 2012
FAQ
Rohini Palaniswamy created HIVE-2988:
----------------------------------------

Summary: Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
Key: HIVE-2988
URL: https://issues.apache.org/jira/browse/HIVE-2988
Project: Hive
Issue Type: Improvement
Components: CLI
Reporter: Rohini Palaniswamy


When running queries on tables with 6000 partitions, hive cli if configured with 128M runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork was 500K which is highly inefficient. We should switch to using something more efficient like XStream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Philip Tromans (JIRA) at Apr 29, 2012 at 10:15 am
    [ https://issues.apache.org/jira/browse/HIVE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264501#comment-13264501 ]

    Philip Tromans commented on HIVE-2988:
    --------------------------------------

    This might not be related, but I've also seen an intermittent StackOverflowError (when Hive is serializing tasks at the beginning of a job) where most of the stack trace is within the XMLEncoder as well. Has anyone else had a problem with this?
    Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
    ----------------------------------------------------------------

    Key: HIVE-2988
    URL: https://issues.apache.org/jira/browse/HIVE-2988
    Project: Hive
    Issue Type: Improvement
    Components: CLI
    Reporter: Rohini Palaniswamy
    Labels: Performance

    When running queries on tables with 6000 partitions, hive cli if configured with 128M runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork was 500K which is highly inefficient. We should switch to using something more efficient like XStream.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Ashutosh Chauhan (JIRA) at Apr 29, 2012 at 3:58 pm
    [ https://issues.apache.org/jira/browse/HIVE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264575#comment-13264575 ]

    Ashutosh Chauhan commented on HIVE-2988:
    ----------------------------------------

    HIVE-2738 also reports problem with XMLEncoder, though looks unrelated.
    Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
    ----------------------------------------------------------------

    Key: HIVE-2988
    URL: https://issues.apache.org/jira/browse/HIVE-2988
    Project: Hive
    Issue Type: Improvement
    Components: CLI
    Reporter: Rohini Palaniswamy
    Labels: Performance

    When running queries on tables with 6000 partitions, hive cli if configured with 128M runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork was 500K which is highly inefficient. We should switch to using something more efficient like XStream.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Edward Capriolo (JIRA) at Apr 29, 2012 at 4:18 pm
    [ https://issues.apache.org/jira/browse/HIVE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264582#comment-13264582 ]

    Edward Capriolo commented on HIVE-2988:
    ---------------------------------------

    I think the patch is great, but the JDK heap default is 512MB in recent JDKs. I know 4K of ram job a man to the moon, but why are you running such a low default?
    Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
    ----------------------------------------------------------------

    Key: HIVE-2988
    URL: https://issues.apache.org/jira/browse/HIVE-2988
    Project: Hive
    Issue Type: Improvement
    Components: CLI
    Reporter: Rohini Palaniswamy
    Labels: Performance

    When running queries on tables with 6000 partitions, hive cli if configured with 128M runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork was 500K which is highly inefficient. We should switch to using something more efficient like XStream.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Rohini Palaniswamy (JIRA) at Apr 30, 2012 at 6:16 pm
    [ https://issues.apache.org/jira/browse/HIVE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265073#comment-13265073 ]

    Rohini Palaniswamy commented on HIVE-2988:
    ------------------------------------------

    I ran with 128M to investigate the OOM. We have resorted to running with 1G as XmX because we keep hitting OOM with bigger tables in hive. There were other things that contributed to the memory usage - mostly Path objects because of the higher number of partitions. But they are absolutely needed. XMLEncoder is something that created too much garbage in a very short span and caused GC. That would be something easy to change/fix without having to touch the core logic.

    We should be looking at fixing the root cause of the problem instead of keeping on increasing the memory requirements. Ours is a highly multi-tenant system and there are lot of other programs(pig,etc) running too in the gateway. So running with a lower memory(256-512MB) will help.

    Found two other reports of this issue:
    http://mail-archives.apache.org/mod_mbox/hive-user/201106.mbox/%3CBANLkTik4THLNkxV87UygvqhoLri3UL9R3Q@mail.gmail.com%3E

    https://issues.apache.org/jira/browse/HIVE-1316
    - This fix increased the max heap size of CLI client and disabled GC overhead limit.
    Use of XMLEncoder to serialize MapredWork causes OOM in hive cli
    ----------------------------------------------------------------

    Key: HIVE-2988
    URL: https://issues.apache.org/jira/browse/HIVE-2988
    Project: Hive
    Issue Type: Improvement
    Components: CLI
    Reporter: Rohini Palaniswamy
    Labels: Performance

    When running queries on tables with 6000 partitions, hive cli if configured with 128M runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork was 500K which is highly inefficient. We should switch to using something more efficient like XStream.
    --
    This message is automatically generated by JIRA.
    If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshive, hadoop
postedApr 27, '12 at 3:39p
activeApr 30, '12 at 6:16p
posts5
users1
websitehive.apache.org

1 user in discussion

Rohini Palaniswamy (JIRA): 5 posts

People

Translate

site design / logo © 2022 Grokbase