Grokbase Groups Hive user August 2010
FAQ
Hi,

I think I may have run into a Hive bug. And I'm not sure what's causing it
or how to work around it.

The reduce task log contains this exception:

<td><pre>java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
at
org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)

This works fine for millions of rows of data, but the one row below causes
the whole job to fail. Looking at the row, I don't see anything that
distinguishes it... if I knew what it was about the row that caused a
problem I could filter it out before hand. I don't mind losing one row in a
million.

2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance
quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-

The source table and query are:

CREATE TABLE IF NOT EXISTS tmp3 (
dt STRING,
hr STRING,
fld1 STRING,
fld2 STRING,
stamp BIGINT,
fld3 STRING,
fld4 INT,
rk STRING,
rd STRING,
rq STRING,
kl ARRAY<String>,
receiver_code_list ARRAY<String>
)
ROW FORMAT DELIMITED
STORED AS SEQUENCEFILE;

-- The limit 88 below is so that the one bad row is included, if I limit to
87 it works without failure.
SELECT count(1)
FROM (select receiver_code_list from tmp3 limit 88) tmp5
LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;

Any tips on what is wrong, or how else I might go about debugging it would
be appreciated. Or a way to have it skip rows that cause errors would be an
acceptable solution as well.

Thanks,
Marc

Search Discussions

  • Paul Yang at Aug 9, 2010 at 3:15 am
    Seem like an issue that was patched already - can you check to see if the column that you are calling explode() with has any null values?

    From: Marc Limotte
    Sent: Sunday, August 08, 2010 7:33 PM
    To: hive-user@hadoop.apache.org
    Subject: NullPointerException in GenericUDTFExplode.process()

    Hi,

    I think I may have run into a Hive bug. And I'm not sure what's causing it or how to work around it.

    The reduce task log contains this exception:
    <td><pre>java.io.IOException: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
    at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)
    This works fine for millions of rows of data, but the one row below causes the whole job to fail. Looking at the row, I don't see anything that distinguishes it... if I knew what it was about the row that caused a problem I could filter it out before hand. I don't mind losing one row in a million.
    2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-

    The source table and query are:
    CREATE TABLE IF NOT EXISTS tmp3 (
    dt STRING,
    hr STRING,
    fld1 STRING,
    fld2 STRING,
    stamp BIGINT,
    fld3 STRING,
    fld4 INT,
    rk STRING,
    rd STRING,
    rq STRING,
    kl ARRAY<String>,
    receiver_code_list ARRAY<String>
    )
    ROW FORMAT DELIMITED
    STORED AS SEQUENCEFILE;

    -- The limit 88 below is so that the one bad row is included, if I limit to 87 it works without failure.
    SELECT count(1)
    FROM (select receiver_code_list from tmp3 limit 88) tmp5
    LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;

    Any tips on what is wrong, or how else I might go about debugging it would be appreciated. Or a way to have it skip rows that cause errors would be an acceptable solution as well.

    Thanks,
    Marc
  • Marc Limotte at Aug 9, 2010 at 6:33 pm
    Hi Paul,

    No nulls. I ensure that every row has at least one entry (a hyphen) before
    I split to create the list.

    Marc
    On Sun, Aug 8, 2010 at 8:14 PM, Paul Yang wrote:

    Seem like an issue that was patched already – can you check to see if the
    column that you are calling explode() with has any null values?



    *From:* Marc Limotte
    *Sent:* Sunday, August 08, 2010 7:33 PM

    *To:* hive-user@hadoop.apache.org
    *Subject:* NullPointerException in GenericUDTFExplode.process()



    Hi,

    I think I may have run into a Hive bug. And I'm not sure what's causing it
    or how to work around it.

    The reduce task log contains this exception:

    <td><pre>java.io.IOException: java.lang.NullPointerException
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.NullPointerException
    at
    org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
    at
    org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
    org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)

    This works fine for millions of rows of data, but the one row below causes
    the whole job to fail. Looking at the row, I don't see anything that
    distinguishes it... if I knew what it was about the row that caused a
    problem I could filter it out before hand. I don't mind losing one row in a
    million.

    2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance
    quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-


    The source table and query are:

    CREATE TABLE IF NOT EXISTS tmp3 (
    dt STRING,
    hr STRING,
    fld1 STRING,
    fld2 STRING,
    stamp BIGINT,
    fld3 STRING,
    fld4 INT,
    rk STRING,
    rd STRING,
    rq STRING,
    kl ARRAY<String>,
    receiver_code_list ARRAY<String>
    )
    ROW FORMAT DELIMITED
    STORED AS SEQUENCEFILE;



    -- The limit 88 below is so that the one bad row is included, if I limit to
    87 it works without failure.
    SELECT count(1)
    FROM (select receiver_code_list from tmp3 limit 88) tmp5
    LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;


    Any tips on what is wrong, or how else I might go about debugging it would
    be appreciated. Or a way to have it skip rows that cause errors would be an
    acceptable solution as well.

    Thanks,
    Marc
  • Marc Limotte at Aug 10, 2010 at 12:55 am
    Also wanted to mention that I'm using the Cloudera distribution of Hive
    (0.5.0+20-2) on CentOS.

    Marc
    On Sun, Aug 8, 2010 at 7:33 PM, Marc Limotte wrote:

    Hi,

    I think I may have run into a Hive bug. And I'm not sure what's causing it
    or how to work around it.

    The reduce task log contains this exception:

    <td><pre>java.io.IOException: java.lang.NullPointerException
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.lang.NullPointerException
    at
    org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
    at
    org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
    org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)

    This works fine for millions of rows of data, but the one row below causes
    the whole job to fail. Looking at the row, I don't see anything that
    distinguishes it... if I knew what it was about the row that caused a
    problem I could filter it out before hand. I don't mind losing one row in a
    million.

    2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance
    quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-

    The source table and query are:

    CREATE TABLE IF NOT EXISTS tmp3 (
    dt STRING,
    hr STRING,
    fld1 STRING,
    fld2 STRING,
    stamp BIGINT,
    fld3 STRING,
    fld4 INT,
    rk STRING,
    rd STRING,
    rq STRING,
    kl ARRAY<String>,
    receiver_code_list ARRAY<String>
    )
    ROW FORMAT DELIMITED
    STORED AS SEQUENCEFILE;

    -- The limit 88 below is so that the one bad row is included, if I limit to
    87 it works without failure.
    SELECT count(1)
    FROM (select receiver_code_list from tmp3 limit 88) tmp5
    LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;

    Any tips on what is wrong, or how else I might go about debugging it would
    be appreciated. Or a way to have it skip rows that cause errors would be an
    acceptable solution as well.

    Thanks,
    Marc

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 9, '10 at 2:33a
activeAug 10, '10 at 12:55a
posts4
users2
websitehive.apache.org

2 users in discussion

Marc Limotte: 3 posts Paul Yang: 1 post

People

Translate

site design / logo © 2022 Grokbase