Grokbase Groups Hive user June 2010
FAQ
Hi all,

I have an input table which has 3 columns one of which is an array list,



bcookie, id, info [Schema: string, string,
arraylist<map<string,string>>]



Here is a sample row



1245, 1, {[myid#id2, myage#100], [myid#id3, myage#101], [myid#id3,
myage#102]}



I wanted to explode the arraylist column and create an additional column
which represents the index of the info



bcookie, id, info, index

1245, 1, [myid#id2, myage#100], 0

1245, 1, [myid#id3, myage#101], 1

1245, 1, [myid#id3, myage#102] , 2



I was thinking of the lines of:

SELECT * FROM srctable LATERAL VIEW explode(info)... ;



Is UDTF the only way to achieve this?

Thanks Viraj

Search Discussions

  • Paul Yang at Jun 29, 2010 at 10:01 pm
    Yes, the UDTF explode() is probably the easiest way to get multiple rows from an array. Another way would be to write your own custom script and use the transform clause. To generate the index of the info, you'll need the field() UDF, but there is currently an issue with it:

    https://issues.apache.org/jira/browse/HIVE-1385


    From: Viraj Bhat
    Sent: Tuesday, June 29, 2010 2:06 PM
    To: hive-user@hadoop.apache.org
    Subject: Best way to create a view with index from explode ?

    Hi all,
    I have an input table which has 3 columns one of which is an array list,

    bcookie, id, info [Schema: string, string, arraylist<map<string,string>>]

    Here is a sample row

    1245, 1, {[myid#id2, myage#100], [myid#id3, myage#101], [myid#id3, myage#102]}

    I wanted to explode the arraylist column and create an additional column which represents the index of the info

    bcookie, id, info, index
    1245, 1, [myid#id2, myage#100], 0
    1245, 1, [myid#id3, myage#101], 1
    1245, 1, [myid#id3, myage#102] , 2

    I was thinking of the lines of:
    SELECT * FROM srctable LATERAL VIEW explode(info)... ;

    Is UDTF the only way to achieve this?
    Thanks Viraj
  • Viraj Bhat at Jun 30, 2010 at 12:09 am
    Hi Paul,

    Thanks for your inputs. I was looking at the Generic explode UDTF in
    your piggybank in the contrib. directory.

    Here are questions:

    1) Does the explode have a contract that if we take in an array of
    maps we return only the maps, not maps and another column (index) with
    it.

    2) Second do UDTF's have some restriction that I cannot have a
    WHERE clause or select other columns. I want to write something like
    this:

    select bcookie, Myexplode(info) as (newcol, index) from myoldtable
    where bcookie is not null;

    3) For adding the additional index column, do I need to override
    the following method [[public void process(Object[] o) throws
    HiveException ]]

    4) Do you have other examples of writing UDTF's

    Any insights would be appreciated.

    Viraj



    ________________________________

    From: Paul Yang
    Sent: Tuesday, June 29, 2010 3:01 PM
    To: hive-user@hadoop.apache.org
    Subject: RE: Best way to create a view with index from explode ?



    Yes, the UDTF explode() is probably the easiest way to get multiple rows
    from an array. Another way would be to write your own custom script and
    use the transform clause. To generate the index of the info, you'll need
    the field() UDF, but there is currently an issue with it:



    https://issues.apache.org/jira/browse/HIVE-1385





    From: Viraj Bhat
    Sent: Tuesday, June 29, 2010 2:06 PM
    To: hive-user@hadoop.apache.org
    Subject: Best way to create a view with index from explode ?



    Hi all,

    I have an input table which has 3 columns one of which is an array list,



    bcookie, id, info [Schema: string, string,
    arraylist<map<string,string>>]



    Here is a sample row



    1245, 1, {[myid#id2, myage#100], [myid#id3, myage#101], [myid#id3,
    myage#102]}



    I wanted to explode the arraylist column and create an additional column
    which represents the index of the info



    bcookie, id, info, index

    1245, 1, [myid#id2, myage#100], 0

    1245, 1, [myid#id3, myage#101], 1

    1245, 1, [myid#id3, myage#102] , 2



    I was thinking of the lines of:

    SELECT * FROM srctable LATERAL VIEW explode(info)... ;



    Is UDTF the only way to achieve this?

    Thanks Viraj
  • Paul Yang at Jun 30, 2010 at 11:39 pm
    1. In the current implementation, explode does have that contract.

    2. WHERE is allowed, but there was a bug with predicate push down that caused a NPE to be thrown. There's a patch for this at https://issues.apache.org/jira/browse/HIVE-1056 or you can add 'set hive.optimize.ppd=false;' before you run your queries.

    3. Yes, you would need to override this method.

    4. If you want to output two columns, check out GenericUDTFExplode2 in contrib for an example on how this can be done.


    From: Viraj Bhat
    Sent: Tuesday, June 29, 2010 5:06 PM
    To: hive-user@hadoop.apache.org
    Subject: RE: Best way to create a view with index from explode ?

    Hi Paul,
    Thanks for your inputs. I was looking at the Generic explode UDTF in your piggybank in the contrib. directory.
    Here are questions:
    1) Does the explode have a contract that if we take in an array of maps we return only the maps, not maps and another column (index) with it.
    2) Second do UDTF's have some restriction that I cannot have a WHERE clause or select other columns. I want to write something like this:
    select bcookie, Myexplode(info) as (newcol, index) from myoldtable where bcookie is not null;
    3) For adding the additional index column, do I need to override the following method [[public void process(Object[] o) throws HiveException ]]
    4) Do you have other examples of writing UDTF's
    Any insights would be appreciated.
    Viraj

    ________________________________
    From: Paul Yang
    Sent: Tuesday, June 29, 2010 3:01 PM
    To: hive-user@hadoop.apache.org
    Subject: RE: Best way to create a view with index from explode ?

    Yes, the UDTF explode() is probably the easiest way to get multiple rows from an array. Another way would be to write your own custom script and use the transform clause. To generate the index of the info, you'll need the field() UDF, but there is currently an issue with it:

    https://issues.apache.org/jira/browse/HIVE-1385


    From: Viraj Bhat
    Sent: Tuesday, June 29, 2010 2:06 PM
    To: hive-user@hadoop.apache.org
    Subject: Best way to create a view with index from explode ?

    Hi all,
    I have an input table which has 3 columns one of which is an array list,

    bcookie, id, info [Schema: string, string, arraylist<map<string,string>>]

    Here is a sample row

    1245, 1, {[myid#id2, myage#100], [myid#id3, myage#101], [myid#id3, myage#102]}

    I wanted to explode the arraylist column and create an additional column which represents the index of the info

    bcookie, id, info, index
    1245, 1, [myid#id2, myage#100], 0
    1245, 1, [myid#id3, myage#101], 1
    1245, 1, [myid#id3, myage#102] , 2

    I was thinking of the lines of:
    SELECT * FROM srctable LATERAL VIEW explode(info)... ;

    Is UDTF the only way to achieve this?
    Thanks Viraj

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 29, '10 at 9:08p
activeJun 30, '10 at 11:39p
posts4
users2
websitehive.apache.org

2 users in discussion

Viraj Bhat: 2 posts Paul Yang: 2 posts

People

Translate

site design / logo © 2021 Grokbase