Grokbase Groups Hive user March 2010
FAQ

[Hive-user] support for arrays, maps, structs while writing output of custom reduce script to table

Dilip Joseph
Mar 22, 2010 at 8:50 pm
Hello,

Does Hive currently support arrays, maps, structs while using custom
reduce/map scripts? 'myreduce.py' in the example below produces an
array of structs delimited by \2s and \3s.

CREATE TABLE SS (
a INT,
b INT,
vals ARRAY<STRUCT<x:INT, y:STRING>>
);

FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
INSERT OVERWRITE TABLE SS
REDUCE *
USING 'myreduce.py'
AS
(a,b, vals)
;

However, the query is failing with the following error message, even
before the script is executed:

FAILED: Error in semantic analysis: line 2:27 Cannot insert into
target table because column number/types are different SS: Cannot
convert column 2 from string to array<struct<x:int,y:string>>.

I saw a discussion about this in
http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
dated over a year ago. Just wondering if there have been any updates.

Thanks,

Dilip
reply

Search Discussions

4 responses

  • Zheng Shao at Mar 22, 2010 at 9:20 pm
    From 0.5 (probably), we can add type information to the column names after "AS".
    Note that the first level separator should be TAB, and the second
    separator should be ^B (and then ^C, etc)
    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
    ;

    On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
    wrote:
    Hello,

    Does Hive currently support arrays, maps, structs while using custom
    reduce/map scripts? 'myreduce.py' in the example below produces an
    array of structs delimited by \2s and \3s.

    CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<x:INT, y:STRING>>
    );

    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a,b, vals)
    ;

    However, the query is failing with the following error message, even
    before the script is executed:

    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from string to array<struct<x:int,y:string>>.

    I saw a discussion about this in
    http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
    dated over a year ago.  Just wondering if there have been any updates.

    Thanks,

    Dilip


    --
    Yours,
    Zheng
  • Dilip Joseph at Mar 22, 2010 at 9:43 pm
    Thanks Zheng, That worked.

    It appears that the type information is converted to lower case before
    comparison. The following statements where "userId" is used as a
    field name failed.

    hive> CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<userId:INT, y:STRING>>
    );
    OK
    Time taken: 0.309 seconds
    hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT,
    b INT,
    vals ARRAY<STRUCT<userId:INT, y:STRING>>
    )
    ;
    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from array<struct<userId:int,y:string>> to
    array<struct<userid:int,y:string>>.

    The same queries worked fine after changing "userId" to "userid".

    Dilip
    On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao wrote:
    From 0.5 (probably), we can add type information to the column names after "AS".
    Note that the first level separator should be TAB, and the second
    separator should be ^B (and then ^C, etc)
    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
    ;

    On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
    wrote:
    Hello,

    Does Hive currently support arrays, maps, structs while using custom
    reduce/map scripts? 'myreduce.py' in the example below produces an
    array of structs delimited by \2s and \3s.

    CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<x:INT, y:STRING>>
    );

    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a,b, vals)
    ;

    However, the query is failing with the following error message, even
    before the script is executed:

    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from string to array<struct<x:int,y:string>>.

    I saw a discussion about this in
    http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
    dated over a year ago. Just wondering if there have been any updates.

    Thanks,

    Dilip


    --
    Yours,
    Zheng


    --
    _________________________________________
    Dilip Antony Joseph
    http://www.marydilip.info
  • Zheng Shao at Mar 22, 2010 at 10:27 pm
    Great!

    This is a bug. Hive field names should be case-insensitive. Can you
    open a JIRA for that?

    Zheng
    On Mon, Mar 22, 2010 at 2:43 PM, Dilip Joseph
    wrote:
    Thanks Zheng,  That worked.

    It appears that the type information is converted to lower case before
    comparison.  The following statements where "userId" is used as a
    field name failed.

    hive> CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<userId:INT, y:STRING>>
    );
    OK
    Time taken: 0.309 seconds
    hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT,
    b INT,
    vals ARRAY<STRUCT<userId:INT, y:STRING>>
    )
    ;
    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from array<struct<userId:int,y:string>> to
    array<struct<userid:int,y:string>>.

    The same queries worked fine after changing "userId" to "userid".

    Dilip
    On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao wrote:
    From 0.5 (probably), we can add type information to the column names after "AS".
    Note that the first level separator should be TAB, and the second
    separator should be ^B (and then ^C, etc)
    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
    ;

    On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
    wrote:
    Hello,

    Does Hive currently support arrays, maps, structs while using custom
    reduce/map scripts? 'myreduce.py' in the example below produces an
    array of structs delimited by \2s and \3s.

    CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<x:INT, y:STRING>>
    );

    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a,b, vals)
    ;

    However, the query is failing with the following error message, even
    before the script is executed:

    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from string to array<struct<x:int,y:string>>.

    I saw a discussion about this in
    http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
    dated over a year ago.  Just wondering if there have been any updates.

    Thanks,

    Dilip


    --
    Yours,
    Zheng


    --
    _________________________________________
    Dilip Antony Joseph
    http://www.marydilip.info


    --
    Yours,
    Zheng
  • Dilip Joseph at Mar 23, 2010 at 4:30 pm
    Opened JIRA https://issues.apache.org/jira/browse/HIVE-1271

    Dilip
    On Mon, Mar 22, 2010 at 3:26 PM, Zheng Shao wrote:
    Great!

    This is a bug. Hive field names should be case-insensitive. Can you
    open a JIRA for that?

    Zheng
    On Mon, Mar 22, 2010 at 2:43 PM, Dilip Joseph
    wrote:
    Thanks Zheng,  That worked.

    It appears that the type information is converted to lower case before
    comparison.  The following statements where "userId" is used as a
    field name failed.

    hive> CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<userId:INT, y:STRING>>
    );
    OK
    Time taken: 0.309 seconds
    hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT,
    b INT,
    vals ARRAY<STRUCT<userId:INT, y:STRING>>
    )
    ;
    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from array<struct<userId:int,y:string>> to
    array<struct<userid:int,y:string>>.

    The same queries worked fine after changing "userId" to "userid".

    Dilip
    On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao wrote:
    From 0.5 (probably), we can add type information to the column names after "AS".
    Note that the first level separator should be TAB, and the second
    separator should be ^B (and then ^C, etc)
    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a INT, b INT, vals ARRAY<STRUCT<x:INT, y:STRING>>)
    ;

    On Mon, Mar 22, 2010 at 1:50 PM, Dilip Joseph
    wrote:
    Hello,

    Does Hive currently support arrays, maps, structs while using custom
    reduce/map scripts? 'myreduce.py' in the example below produces an
    array of structs delimited by \2s and \3s.

    CREATE TABLE SS (
    a INT,
    b INT,
    vals ARRAY<STRUCT<x:INT, y:STRING>>
    );

    FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
    INSERT OVERWRITE TABLE SS
    REDUCE *
    USING 'myreduce.py'
    AS
    (a,b, vals)
    ;

    However, the query is failing with the following error message, even
    before the script is executed:

    FAILED: Error in semantic analysis: line 2:27 Cannot insert into
    target table because column number/types are different SS: Cannot
    convert column 2 from string to array<struct<x:int,y:string>>.

    I saw a discussion about this in
    http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00160.html,
    dated over a year ago.  Just wondering if there have been any updates.

    Thanks,

    Dilip


    --
    Yours,
    Zheng


    --
    _________________________________________
    Dilip Antony Joseph
    http://www.marydilip.info


    --
    Yours,
    Zheng


    --
    _________________________________________
    Dilip Antony Joseph
    http://www.marydilip.info

Related Discussions

Discussion Navigation
viewthread | post

2 users in discussion

Dilip Joseph: 3 posts Zheng Shao: 2 posts