FAQ
Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc. Nothing fancy. However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

This result gets produced with basic queries like:

SELECT count(1) FROM medium_table;

However, if do the following:

SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

It works okay until I get to around 70,800ish then I get the first error message again. I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine. And I am using a custom SerDe. I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem. I can't see anything in the data that would be causing it though.

Anyone have any ideas of what might be causing this or something I can check?

Thanks,
Pat

Search Discussions

  • Ajo Fod at Jan 27, 2011 at 1:57 am
    Any chance you can convert the data to a tab separated text file and try the
    same query?

    It may not be the SerDe, but it may be good to isolate that away as a
    potential source of the problem.

    -Ajo.
    On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat wrote:

    Hi,

    I’m attempting to load a small to medium sized log file, ~250MB, and
    produce some basic reports from it, counts etc. Nothing fancy. However,
    whenever I try and read the entire dataset, ~330k rows, I get the following
    error:



    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask



    This result gets produced with basic queries like:



    SELECT count(1) FROM medium_table;



    However, if do the following:



    SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;



    It works okay until I get to around 70,800ish then I get the first error
    message again. I’m running my HDFS system in single node, pseudo
    distributed mode with 1.5GB of memory and 20 GB of disk as a virtual
    machine. And I am using a custom SerDe. I don’t think it’s the SerDe but
    I’m open to suggestions for how I can check if it is causing the problem. I
    can’t see anything in the data that would be causing it though.



    Anyone have any ideas of what might be causing this or something I can
    check?



    Thanks,

    Pat
  • Hadoop n00b at Jan 27, 2011 at 5:09 am
    We typically get this error while running complex queries on our 4-node
    setup when the child JVM runs out of heap size. Would be interested in what
    the experts have to say about this error.
    On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod wrote:

    Any chance you can convert the data to a tab separated text file and try
    the same query?

    It may not be the SerDe, but it may be good to isolate that away as a
    potential source of the problem.

    -Ajo.


    On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <
    patrick.christopher@hp.com> wrote:
    Hi,

    I’m attempting to load a small to medium sized log file, ~250MB, and
    produce some basic reports from it, counts etc. Nothing fancy. However,
    whenever I try and read the entire dataset, ~330k rows, I get the following
    error:



    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask



    This result gets produced with basic queries like:



    SELECT count(1) FROM medium_table;



    However, if do the following:



    SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;



    It works okay until I get to around 70,800ish then I get the first error
    message again. I’m running my HDFS system in single node, pseudo
    distributed mode with 1.5GB of memory and 20 GB of disk as a virtual
    machine. And I am using a custom SerDe. I don’t think it’s the SerDe but
    I’m open to suggestions for how I can check if it is causing the problem. I
    can’t see anything in the data that would be causing it though.



    Anyone have any ideas of what might be causing this or something I can
    check?



    Thanks,

    Pat
  • Christopher, Pat at Jan 27, 2011 at 6:28 pm
    It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record. I will try and create something similar though, might take a bit. Thanks.

    I've tried resetting the heap size, I think. I added the following block to my mapred-site.xml:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xm512M</value>
    </property>

    Is that how I'm supposed to do that?

    Thanks,
    Pat

    From: hadoop n00b
    Sent: Wednesday, January 26, 2011 9:09 PM
    To: user@hive.apache.org
    Subject: Re: Hive Error on medium sized dataset

    We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error.
    On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod wrote:
    Any chance you can convert the data to a tab separated text file and try the same query?

    It may not be the SerDe, but it may be good to isolate that away as a potential source of the problem.

    -Ajo.

    On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat wrote:
    Hi,
    I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc. Nothing fancy. However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    This result gets produced with basic queries like:

    SELECT count(1) FROM medium_table;

    However, if do the following:

    SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

    It works okay until I get to around 70,800ish then I get the first error message again. I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine. And I am using a custom SerDe. I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem. I can't see anything in the data that would be causing it though.

    Anyone have any ideas of what might be causing this or something I can check?

    Thanks,
    Pat
  • Christopher, Pat at Jan 27, 2011 at 7:22 pm
    I removed the part of the SerDe that handled the arbitrary key/value pairs and I was able to process my entire data set. Sadly the part I removed has all the interesting data.

    I'll play more with the heap settings and see if that lets me process the key/value pairs. Is the below the correct way to set the child heap value?

    Thanks,
    Pat

    From: Christopher, Pat
    Sent: Thursday, January 27, 2011 10:27 AM
    To: user@hive.apache.org
    Subject: RE: Hive Error on medium sized dataset

    It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record. I will try and create something similar though, might take a bit. Thanks.

    I've tried resetting the heap size, I think. I added the following block to my mapred-site.xml:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xm512M</value>
    </property>

    Is that how I'm supposed to do that?

    Thanks,
    Pat

    From: hadoop n00b
    Sent: Wednesday, January 26, 2011 9:09 PM
    To: user@hive.apache.org
    Subject: Re: Hive Error on medium sized dataset

    We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error.
    On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod wrote:
    Any chance you can convert the data to a tab separated text file and try the same query?

    It may not be the SerDe, but it may be good to isolate that away as a potential source of the problem.

    -Ajo.

    On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat wrote:
    Hi,
    I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc. Nothing fancy. However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    This result gets produced with basic queries like:

    SELECT count(1) FROM medium_table;

    However, if do the following:

    SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

    It works okay until I get to around 70,800ish then I get the first error message again. I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine. And I am using a custom SerDe. I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem. I can't see anything in the data that would be causing it though.

    Anyone have any ideas of what might be causing this or something I can check?

    Thanks,
    Pat
  • Christopher, Pat at Jan 28, 2011 at 12:47 am
    It was the SerDe. There was a null pointer error. It was getting reported to a hadoop logfile and not to anywhere in Hive. I found the hadoop log and fixed the problem.

    Thanks for the help!

    Pat

    From: Christopher, Pat
    Sent: Thursday, January 27, 2011 11:21 AM
    To: user@hive.apache.org
    Subject: RE: Hive Error on medium sized dataset

    I removed the part of the SerDe that handled the arbitrary key/value pairs and I was able to process my entire data set. Sadly the part I removed has all the interesting data.

    I'll play more with the heap settings and see if that lets me process the key/value pairs. Is the below the correct way to set the child heap value?

    Thanks,
    Pat

    From: Christopher, Pat
    Sent: Thursday, January 27, 2011 10:27 AM
    To: user@hive.apache.org
    Subject: RE: Hive Error on medium sized dataset

    It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record. I will try and create something similar though, might take a bit. Thanks.

    I've tried resetting the heap size, I think. I added the following block to my mapred-site.xml:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xm512M</value>
    </property>

    Is that how I'm supposed to do that?

    Thanks,
    Pat

    From: hadoop n00b
    Sent: Wednesday, January 26, 2011 9:09 PM
    To: user@hive.apache.org
    Subject: Re: Hive Error on medium sized dataset

    We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error.
    On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod wrote:
    Any chance you can convert the data to a tab separated text file and try the same query?

    It may not be the SerDe, but it may be good to isolate that away as a potential source of the problem.

    -Ajo.

    On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat wrote:
    Hi,
    I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc. Nothing fancy. However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    This result gets produced with basic queries like:

    SELECT count(1) FROM medium_table;

    However, if do the following:

    SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

    It works okay until I get to around 70,800ish then I get the first error message again. I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine. And I am using a custom SerDe. I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem. I can't see anything in the data that would be causing it though.

    Anyone have any ideas of what might be causing this or something I can check?

    Thanks,
    Pat
  • Hadoop n00b at Jan 28, 2011 at 6:34 am
    Return code 2 essentially means a hadoop error. Congrats on locating and
    fixing your issue.

    However, can somebody still throw some light on this particular error code?

    On Fri, Jan 28, 2011 at 6:16 AM, Christopher, Pat wrote:

    It was the SerDe. There was a null pointer error. It was getting
    reported to a hadoop logfile and not to anywhere in Hive. I found the
    hadoop log and fixed the problem.



    Thanks for the help!



    Pat



    *From:* Christopher, Pat
    *Sent:* Thursday, January 27, 2011 11:21 AM

    *To:* user@hive.apache.org
    *Subject:* RE: Hive Error on medium sized dataset



    I removed the part of the SerDe that handled the arbitrary key/value pairs
    and I was able to process my entire data set. Sadly the part I removed has
    all the interesting data.



    I’ll play more with the heap settings and see if that lets me process the
    key/value pairs. Is the below the correct way to set the child heap value?




    Thanks,

    Pat



    *From:* Christopher, Pat
    *Sent:* Thursday, January 27, 2011 10:27 AM
    *To:* user@hive.apache.org
    *Subject:* RE: Hive Error on medium sized dataset



    It will be tricky to clean up the data format as I’m operating on somewhat
    arbitrary key-value pairs in part of the record. I will try and create
    something similar though, might take a bit. Thanks.



    I’ve tried resetting the heap size, I think. I added the following block
    to my mapred-site.xml:



    <property>

    <name>mapred.child.java.opts</name>

    <value>-Xm512M</value>

    </property>



    Is that how I’m supposed to do that?



    Thanks,

    Pat



    *From:* hadoop n00b
    *Sent:* Wednesday, January 26, 2011 9:09 PM
    *To:* user@hive.apache.org
    *Subject:* Re: Hive Error on medium sized dataset



    We typically get this error while running complex queries on our 4-node
    setup when the child JVM runs out of heap size. Would be interested in what
    the experts have to say about this error.

    On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod wrote:

    Any chance you can convert the data to a tab separated text file and try
    the same query?

    It may not be the SerDe, but it may be good to isolate that away as a
    potential source of the problem.

    -Ajo.



    On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <
    patrick.christopher@hp.com> wrote:

    Hi,

    I’m attempting to load a small to medium sized log file, ~250MB, and
    produce some basic reports from it, counts etc. Nothing fancy. However,
    whenever I try and read the entire dataset, ~330k rows, I get the following
    error:



    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask



    This result gets produced with basic queries like:



    SELECT count(1) FROM medium_table;



    However, if do the following:



    SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;



    It works okay until I get to around 70,800ish then I get the first error
    message again. I’m running my HDFS system in single node, pseudo
    distributed mode with 1.5GB of memory and 20 GB of disk as a virtual
    machine. And I am using a custom SerDe. I don’t think it’s the SerDe but
    I’m open to suggestions for how I can check if it is causing the problem. I
    can’t see anything in the data that would be causing it though.



    Anyone have any ideas of what might be causing this or something I can
    check?



    Thanks,

    Pat



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJan 27, '11 at 1:48a
activeJan 28, '11 at 6:34a
posts7
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase