FAQ
Hi,

I'm new to impala and trying to create an external table.
The data is stored in a csv file.

As an example:

create external table foo(bar int)
row format delimited fields terminated by ','
stored as testfile location '/user/hive/data';

The values in my csv file are surrounded by quotation marks '"':
"0"
"1"

When trying to run a query that sums all values I get the following error:
Error converting column: 0 TO INT (Data is: "0")

Is there a way to convert the data to integers or specify that all values
in the csv file are surrounded with quotation marks which are not part of
the value?

Many thanks,
Monika

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • John Russell at Nov 26, 2013 at 6:02 pm
    Monika, you are correct that there is no syntax for the table creation like OPTIONALLY ENCLOSED BY, like you might see in Oracle SQL*Loader or other data-loading utilities.

    One choice is to rewrite the files to get rid of the quotation marks.

    If rewriting the data files is not practical for disk space or workflow reasons, a workaround could be to define the table with that column as a STRING, and make a view that reformats and casts the column to an integer type. For example:

    [localhost:21000] > create table quoted_strings (x string);
    Query: create table quoted_strings (x string)
    [localhost:21000] > insert into quoted_strings values ("0"),("1"),("100");
    Query: insert into quoted_strings values ("0"),("1"),("100")
    Inserted 3 rows in 0.99s
    [localhost:21000] > create view numbers as select cast(regexp_replace(x,'"','') as bigint) as x from quoted_strings;
    Query: create view numbers as select cast(regexp_replace(x,'"','') as bigint) as x from quoted_strings
    [localhost:21000] > select * from numbers;
    Query: select * from numbers
    Query finished, fetching results ...
    +-----+
    x | +-----+
    0 |
    1 |
    100 |
    +-----+
    Returned 3 row(s) in 0.42s
    [localhost:21000] > select sum(x) from numbers;
    Query: select sum(x) from numbers
    Query finished, fetching results ...
    +--------+
    sum(x) |
    +--------+
    101 |
    +--------+
    Returned 1 row(s) in 0.35s

    Cheers,
    John
    On Nov 26, 2013, at 7:56 AM, Monika Moser wrote:

    Hi,

    I'm new to impala and trying to create an external table.
    The data is stored in a csv file.

    As an example:

    create external table foo(bar int)
    row format delimited fields terminated by ','
    stored as testfile location '/user/hive/data';

    The values in my csv file are surrounded by quotation marks '"':
    "0"
    "1"

    When trying to run a query that sums all values I get the following error:
    Error converting column: 0 TO INT (Data is: "0")

    Is there a way to convert the data to integers or specify that all values in the csv file are surrounded with quotation marks which are not part of the value?

    Many thanks,
    Monika


    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Monika Moser at Nov 27, 2013 at 3:15 pm
    Hi John,

    many thanks for your answer.
    For now I rewrite the file.
    But good to know your solution with the view in case rewriting will not be
    practical for some reason in future.

    Monika

    Am Dienstag, 26. November 2013 19:02:42 UTC+1 schrieb John Russell:
    Monika, you are correct that there is no syntax for the table creation
    like OPTIONALLY ENCLOSED BY, like you might see in Oracle SQL*Loader or
    other data-loading utilities.

    One choice is to rewrite the files to get rid of the quotation marks.

    If rewriting the data files is not practical for disk space or workflow
    reasons, a workaround could be to define the table with that column as a
    STRING, and make a view that reformats and casts the column to an integer
    type. For example:

    [localhost:21000] > create table quoted_strings (x string);
    Query: create table quoted_strings (x string)
    [localhost:21000] > insert into quoted_strings values ("0"),("1"),("100");
    Query: insert into quoted_strings values ("0"),("1"),("100")
    Inserted 3 rows in 0.99s
    [localhost:21000] > create view numbers as select
    cast(regexp_replace(x,'"','') as bigint) as x from quoted_strings;
    Query: create view numbers as select cast(regexp_replace(x,'"','') as
    bigint) as x from quoted_strings
    [localhost:21000] > select * from numbers;
    Query: select * from numbers
    Query finished, fetching results ...
    +-----+
    x | +-----+
    0 |
    1 |
    100 |
    +-----+
    Returned 3 row(s) in 0.42s
    [localhost:21000] > select sum(x) from numbers;
    Query: select sum(x) from numbers
    Query finished, fetching results ...
    +--------+
    sum(x) |
    +--------+
    101 |
    +--------+
    Returned 1 row(s) in 0.35s

    Cheers,
    John

    On Nov 26, 2013, at 7:56 AM, Monika Moser wrote:

    Hi,

    I'm new to impala and trying to create an external table.
    The data is stored in a csv file.

    As an example:

    create external table foo(bar int)
    row format delimited fields terminated by ','
    stored as testfile location '/user/hive/data';

    The values in my csv file are surrounded by quotation marks '"':
    "0"
    "1"

    When trying to run a query that sums all values I get the following error:
    Error converting column: 0 TO INT (Data is: "0")

    Is there a way to convert the data to integers or specify that all values
    in the csv file are surrounded with quotation marks which are not part of
    the value?

    Many thanks,
    Monika


    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user...@cloudera.org <javascript:>.

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedNov 26, '13 at 3:56p
activeNov 27, '13 at 3:15p
posts3
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Monika Moser: 2 posts John Russell: 1 post

People

Translate

site design / logo © 2022 Grokbase