Grokbase Groups Hive user May 2011
FAQ
Hi,

I am using this to load the apache log into Hadoop via Hive (my version is
0.4.1).

CREATE TABLE apache_log (
...
logdate STRING,
...
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*)
\\[(\\w+\/\\w+\/\\w+)\:(\\d+:\\d+:\\d+) ...
...

The date is coming in this format: dd/mmm/yyyy.
I would like to be able to load the data using this date format:
yyyy-mmm-dd.

1. Has anyone done this before loading the date in a different a different
format?
2. Also, how do you specify in the create table statement above that the
partition is the logdate?
3. And when I tried to convert the old date into unixtime format via this
sql, hive complains.

hive> select from_unixtime( unix_timestamp( logdate, 'dd/MMM/yyyy')) from
apache_log;
FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch
from_unixtime: Looking for UDF "from_unixtime" with parameters [class
org.apache.hadoop.io.LongWritable]

Has anyone encountered these issues before?

Thanks.

Search Discussions

  • Jov at May 6, 2011 at 11:36 pm
    在 2011-5-7 上午6:48,"bichonfrise74" <bichonfrise74@gmail.com>写道:
    Hi,

    I am using this to load the apache log into Hadoop via Hive (my version is 0.4.1).
    CREATE TABLE apache_log (
    ...
    logdate STRING,
    ...
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*)
    \\[(\\w+\/\\w+\/\\w+)\:(\\d+:\\d+:\\d+) ...
    ...

    The date is coming in this format: dd/mmm/yyyy.
    I would like to be able to load the data using this date format:
    yyyy-mmm-dd.
    1. Has anyone done this before loading the date in a different a different format?
    2. Also, how do you specify in the create table statement above that the
    partition is the logdate?
    3. And when I tried to convert the old date into unixtime format via this
    sql, hive complains.
    hive> select from_unixtime( unix_timestamp( logdate, 'dd/MMM/yyyy')) from
    apache_log;
    FAILED: Error in semantic analysis: line 1:7 Function Argument Type
    Mismatch from_unixtime: Looking for UDF "from_unixtime" with parameters
    [class org.apache.hadoop.io.LongWritable]

    The unix_timestamp func returns bigint while the from_unixtime func only
    accepts int as its parameter.so you should use cast:
    from_unixtime(cast( unix_timestamp( logdate, 'dd/MMM/yyyy') as int))
    Has anyone encountered these issues before?

    Thanks.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 6, '11 at 10:48p
activeMay 6, '11 at 11:36p
posts2
users2
websitehive.apache.org

2 users in discussion

Bichonfrise74: 1 post Jov: 1 post

People

Translate

site design / logo © 2021 Grokbase