Grokbase Groups Hive user July 2010
FAQ
We are importing hadoop logs inside hive, but are running in some issues.
Sample log lines:
2010-02-25 14:27:18,000 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:

Query: SELECT * FROM logs_temp;
runs fine for the above statement.

However, for the log lines:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at
cm-hadoop01.mozilla.org/10.2.72.53
************************************************************/

Query: SELECT * FROM logs_temp;
Failed with exception java.io.IOException:java.lang.NullPointerException

However, SELECT count(1) FROM logs_temp;
returns 3 rows, which is correct.

Table structure given below:
add jar /usr/lib/hive/lib/hive_contrib.jar;
CREATE EXTERNAL TABLE logs_temp(
line_date STRING,
line_time STRING,
message_type STRING,
classname STRING,
message STRING
)

PARTITIONED BY (ds STRING, ts STRING, hn STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" =
"^(\\d{4}(?>-\\d{2}){2})\\s((?>\\d{2}[:,]){3}\\d{3})\\s([A-Z]+)\\s([^:]+):\\s(.*)"
)
STORED AS TEXTFILE;



Any idea on what might be going wrong here?

-Anurag

Search Discussions

  • Parag Arora at Jul 31, 2010 at 6:45 am
    It seems that your serde output must have been null.
    On Sat, Jul 31, 2010 at 7:28 AM, Anurag Phadke wrote:

    We are importing hadoop logs inside hive, but are running in some issues.
    Sample log lines:
    2010-02-25 14:27:18,000 INFO org.apache.hadoop.mapred.TaskTracker:
    SHUTDOWN_MSG:

    Query: SELECT * FROM logs_temp;
    runs fine for the above statement.

    However, for the log lines:
    /************************************************************
    SHUTDOWN_MSG: Shutting down TaskTracker at
    cm-hadoop01.mozilla.org/10.2.72.53
    ************************************************************/

    Query: SELECT * FROM logs_temp;
    Failed with exception java.io.IOException:java.lang.NullPointerException

    However, SELECT count(1) FROM logs_temp;
    returns 3 rows, which is correct.

    Table structure given below:
    add jar /usr/lib/hive/lib/hive_contrib.jar;
    CREATE EXTERNAL TABLE logs_temp(
    line_date STRING,
    line_time STRING,
    message_type STRING,
    classname STRING,
    message STRING
    )

    PARTITIONED BY (ds STRING, ts STRING, hn STRING)

    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" =

    "^(\\d{4}(?>-\\d{2}){2})\\s((?>\\d{2}[:,]){3}\\d{3})\\s([A-Z]+)\\s([^:]+):\\s(.*)"
    )
    STORED AS TEXTFILE;



    Any idea on what might be going wrong here?

    -Anurag

    --
    Parag
    http://www.paragarora.com
    Phone: +91.8080350130
  • Anurag Phadke at Jul 31, 2010 at 6:51 am
    it's a regex that fails when it sees an invalid line such as (/***************)
    tips on what can be done to fix this?


    ----- Original Message -----
    From: Parag Arora <parag@webaroo.com>
    To: hive-user@hadoop.apache.org
    Sent: Fri, 30 Jul 2010 23:42:28 -0700 (PDT)
    Subject: Re: Failed with exception java.io.IOException:java.lang.NullPointerException
    It seems that your serde output must have been null.
    On Sat, Jul 31, 2010 at 7:28 AM, Anurag Phadke wrote:
    We are importing hadoop logs inside hive, but are running in some issues.
    Sample log lines:
    2010-02-25 14:27:18,000 INFO org.apache.hadoop.mapred.TaskTracker:
    SHUTDOWN_MSG:

    Query: SELECT * FROM logs_temp;
    runs fine for the above statement.

    However, for the log lines:
    /************************************************************
    SHUTDOWN_MSG: Shutting down TaskTracker at
    cm-hadoop01.mozilla.org/10.2.72.53
    ************************************************************/

    Query: SELECT * FROM logs_temp;
    Failed with exception java.io.IOException:java.lang.NullPointerException

    However, SELECT count(1) FROM logs_temp;
    returns 3 rows, which is correct.

    Table structure given below:
    add jar /usr/lib/hive/lib/hive_contrib.jar;
    CREATE EXTERNAL TABLE logs_temp(
    line_date STRING,
    line_time STRING,
    message_type STRING,
    classname STRING,
    message STRING
    )

    PARTITIONED BY (ds STRING, ts STRING, hn STRING)

    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" =

    "^(\\d{4}(?>-\\d{2}){2})\\s((?>\\d{2}[:,]){3}\\d{3})\\s([A-Z]+)\\s([^:]+):\\s(.*)"
    )
    STORED AS TEXTFILE;



    Any idea on what might be going wrong here?

    -Anurag
    --
    Parag
    http://www.paragarora.com
    Phone: +91.8080350130

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 31, '10 at 1:58a
activeJul 31, '10 at 6:51a
posts3
users2
websitehive.apache.org

2 users in discussion

Anurag Phadke: 2 posts Parag Arora: 1 post

People

Translate

site design / logo © 2022 Grokbase