Grokbase Groups Hive user July 2009
FAQ
Hi,

The issue of nested types addressed recently through JIRA HIVE-603 is very useful. But I have an issue with the schema specification.
I have a table page_views with two columns - page_info is a map with key delimiter as Ctrl-D and the key-value pair (record) delimiter as Ctrl-C and page_links is a list of maps with each list item separated using Ctrl-B, map delimiters being Ctrl-D and Ctrl-C as mentioned above.
In the DDL statement, if I do not specify "collection items terminated by" and "array items terminated by" clauses, page_links is deserialized properly, but page_info is not deserialized properly. If I specify the clauses - collection items terminated by '\003' and map keys terminated by '\004', page_info is deserialized properly but page_links is not deserialized properly. The reason I think is that in page_links it considers '\003' or Ctrl-C as delimiter for both array and map record. But I have Ctrl-B as array delimiter and Ctrl-D as map record delimiter.
I think we should replace the clause "collection items terminated by" with separate clauses like "list items terminated by" and "map items terminated by".

Thanks,
Rakesh

Search Discussions

  • Rakesh Setty at Jul 7, 2009 at 7:08 pm
    I think this solution will not deal with maps within maps and lists within lists.

    Thanks,
    Rakesh

    ________________________________
    From: Rakesh Setty
    Sent: Tuesday, July 07, 2009 11:37 AM
    To: 'hive-user@hadoop.apache.org'
    Subject: Issue with nested types

    Hi,

    The issue of nested types addressed recently through JIRA HIVE-603 is very useful. But I have an issue with the schema specification.
    I have a table page_views with two columns - page_info is a map with key delimiter as Ctrl-D and the key-value pair (record) delimiter as Ctrl-C and page_links is a list of maps with each list item separated using Ctrl-B, map delimiters being Ctrl-D and Ctrl-C as mentioned above.
    In the DDL statement, if I do not specify "collection items terminated by" and "array items terminated by" clauses, page_links is deserialized properly, but page_info is not deserialized properly. If I specify the clauses - collection items terminated by '\003' and map keys terminated by '\004', page_info is deserialized properly but page_links is not deserialized properly. The reason I think is that in page_links it considers '\003' or Ctrl-C as delimiter for both array and map record. But I have Ctrl-B as array delimiter and Ctrl-D as map record delimiter.
    I think we should replace the clause "collection items terminated by" with separate clauses like "list items terminated by" and "map items terminated by".

    Thanks,
    Rakesh
  • Zheng Shao at Jul 7, 2009 at 8:24 pm
    Hi Rakesh,

    Your analysis is correct overall.

    The specification of delimiters in DDL statement (create table ...) is
    invented when we only allow a single level of list or map.
    If there are multiple levels, these delimiter specifications won't
    work as you expect.

    For now, please do the following when creating nested types.
    1. Don't specify any delimiters when creating the table
    2. When loading the data, the data should be formatted in this way:
    A. Each level of list will take one level of delimiter, and each level
    of map will take two levels of delimitors.
    B. If it's list of list, the first list will be delimited by ^A, the
    second will be delimited by ^B
    C. If it's map of map, the first map will take ^A and ^B, the second
    will take ^C and ^D.
    D. If it's list of map, the list will take ^A, map will take ^B, ^C.
    E. If it's map of list, the map will take ^A, ^B, the list will take ^C.


    I hope this helps to solve your problem. We will allow customizable
    delimiters in the future (please open a jira if you are dependent on
    that).

    Zheng

    On Tue, Jul 7, 2009 at 12:00 PM, Rakesh Settywrote:
    I think this solution will not deal with maps within maps and lists within
    lists.



    Thanks,

    Rakesh



    ________________________________

    From: Rakesh Setty
    Sent: Tuesday, July 07, 2009 11:37 AM
    To: 'hive-user@hadoop.apache.org'
    Subject: Issue with nested types



    Hi,



    The issue of nested types addressed recently through JIRA
    HIVE-603 is very useful. But I have an issue with the schema specification.

    I have a table page_views with two columns – page_info is a map
    with key delimiter as Ctrl-D and the key-value pair (record) delimiter as
    Ctrl-C and page_links is a list of maps with each list item separated using
    Ctrl-B, map delimiters being Ctrl-D and Ctrl-C as mentioned above.

    In the DDL statement, if I do not specify “collection items
    terminated by” and “array items terminated by” clauses, page_links is
    deserialized properly, but page_info is not deserialized properly. If I
    specify the clauses - collection items terminated by ‘\003’ and map keys
    terminated by ‘\004’, page_info is deserialized properly but page_links is
    not deserialized properly. The reason I think is that in page_links it
    considers ‘\003’ or Ctrl-C as delimiter for both array and map record. But I
    have Ctrl-B as array delimiter and Ctrl-D as map record delimiter.

    I think we should replace the clause “collection items
    terminated by” with separate clauses like “list items terminated by” and
    “map items terminated by”.



    Thanks,

    Rakesh


    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 7, '09 at 6:38p
activeJul 7, '09 at 8:24p
posts3
users2
websitehive.apache.org

2 users in discussion

Rakesh Setty: 2 posts Zheng Shao: 1 post

People

Translate

site design / logo © 2022 Grokbase