FAQ
Dear hive-user's,

I've written my own custom SerDe to handle some log files in a custom
format and as I'd quite like to (eventually) use the JDBC driver down
the line, I'd quite like to retain the column types for the output.
Part of the reason for this is that we're using OpenCSV
(http://opencsv.sourceforge.net/) to produce them in the first place,
so it'd be good to use it again to parse the files when used for
querying in Hive.

I've implemented my own SerDe, originally using
MetadataTypedColumnsetSerDe as a basis, however whenever I run a
query, no data is returned, regardless of the amount of data I load
into the table. The load proceeds fine. I am using the version of Hive
from Cloudera's CDH3 distribution (based on 0.5.0).

My create table statement is:

CREATE TABLE my_test_table (col_name_1 STRING, col_name_2, INT, ...
etc) COMMENT 'Some comment' PARTITIONED BY (part_col_1 STRING,
part_col_2 STRING)
ROW FORMAT SERDE "com.my.package.named.MyNewSerDe" STORED AS TEXTFILE;

I have switched on the debug logging and put a bunch of debug
statements in my code and I've found that when I do a simple query
(like "select * from my_test_table limit 10;") so that it runs
locally, it does find the class. Indeed it calls the initialize method
and calls the getObjectInspector method a number of times.
Subsequently though, it calls initialize on LazySimpleSerDe three
times. The first two times it has dummy column names (_col0) and the
correct column types in the correct order. The last time it contains
no column names or types at all.

Presumably I'm missing something fairly simple from somewhere (a class
extension missing, wrong class returned by getSerializedClass() or
perhaps constructing the ObjectInspector incorrectly?) but for the
life of me I can't spot it. The underlying files are just CSV's
constructed using the OpenCSV library above.

I'd be very grateful for any suggestions.

Thanks,

Jamie

Search Discussions

  • Jamie Cockrill at Oct 1, 2010 at 1:57 pm
    Dear all,

    I managed to fix this by starting from scratch and re-creating the
    table and loading data into it. There must have been something odd
    about the way I created my original table.

    thanks

    Jamie
    On 30 September 2010 10:20, Jamie Cockrill wrote:
    Dear hive-user's,

    I've written my own custom SerDe to handle some log files in a custom
    format and as I'd quite like to (eventually) use the JDBC driver down
    the line, I'd quite like to retain the column types for the output.
    Part of the reason for this is that we're using OpenCSV
    (http://opencsv.sourceforge.net/) to produce them in the first place,
    so it'd be good to use it again to parse the files when used for
    querying in Hive.

    I've implemented my own SerDe, originally using
    MetadataTypedColumnsetSerDe as a basis, however whenever I run a
    query, no data is returned, regardless of the amount of data I load
    into the table. The load proceeds fine. I am using the version of Hive
    from Cloudera's CDH3 distribution (based on 0.5.0).

    My create table statement is:

    CREATE TABLE my_test_table (col_name_1 STRING, col_name_2, INT, ...
    etc) COMMENT 'Some comment' PARTITIONED BY (part_col_1 STRING,
    part_col_2 STRING)
    ROW FORMAT SERDE "com.my.package.named.MyNewSerDe" STORED AS TEXTFILE;

    I have switched on the debug logging and put a bunch of debug
    statements in my code and I've found that when I do a simple query
    (like "select * from my_test_table limit 10;") so that it runs
    locally, it does find the class. Indeed it calls the initialize method
    and calls the getObjectInspector method a number of times.
    Subsequently though, it calls initialize on LazySimpleSerDe three
    times. The first two times it has dummy column names (_col0) and the
    correct column types in the correct order. The last time it contains
    no column names or types at all.

    Presumably I'm missing something fairly simple from somewhere (a class
    extension missing, wrong class returned by getSerializedClass() or
    perhaps constructing the ObjectInspector incorrectly?) but for the
    life of me I can't spot it. The underlying files are just CSV's
    constructed using the OpenCSV library above.

    I'd be very grateful for any suggestions.

    Thanks,

    Jamie

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 30, '10 at 9:21a
activeOct 1, '10 at 1:57p
posts2
users1
websitehive.apache.org

1 user in discussion

Jamie Cockrill: 2 posts

People

Translate

site design / logo © 2021 Grokbase