Hi Rakesh,
Your analysis is correct overall.
The specification of delimiters in DDL statement (create table ...) is
invented when we only allow a single level of list or map.
If there are multiple levels, these delimiter specifications won't
work as you expect.
For now, please do the following when creating nested types.
1. Don't specify any delimiters when creating the table
2. When loading the data, the data should be formatted in this way:
A. Each level of list will take one level of delimiter, and each level
of map will take two levels of delimitors.
B. If it's list of list, the first list will be delimited by ^A, the
second will be delimited by ^B
C. If it's map of map, the first map will take ^A and ^B, the second
will take ^C and ^D.
D. If it's list of map, the list will take ^A, map will take ^B, ^C.
E. If it's map of list, the map will take ^A, ^B, the list will take ^C.
I hope this helps to solve your problem. We will allow customizable
delimiters in the future (please open a jira if you are dependent on
that).
Zheng
On Tue, Jul 7, 2009 at 12:00 PM, Rakesh Settywrote:
I think this solution will not deal with maps within maps and lists within
lists.
Thanks,
Rakesh
________________________________
From: Rakesh Setty
Sent: Tuesday, July 07, 2009 11:37 AM
To: 'hive-user@hadoop.apache.org'
Subject: Issue with nested types
Hi,
The issue of nested types addressed recently through JIRA
HIVE-603 is very useful. But I have an issue with the schema specification.
I have a table page_views with two columns – page_info is a map
with key delimiter as Ctrl-D and the key-value pair (record) delimiter as
Ctrl-C and page_links is a list of maps with each list item separated using
Ctrl-B, map delimiters being Ctrl-D and Ctrl-C as mentioned above.
In the DDL statement, if I do not specify “collection items
terminated by” and “array items terminated by” clauses, page_links is
deserialized properly, but page_info is not deserialized properly. If I
specify the clauses - collection items terminated by ‘\003’ and map keys
terminated by ‘\004’, page_info is deserialized properly but page_links is
not deserialized properly. The reason I think is that in page_links it
considers ‘\003’ or Ctrl-C as delimiter for both array and map record. But I
have Ctrl-B as array delimiter and Ctrl-D as map record delimiter.
I think we should replace the clause “collection items
terminated by” with separate clauses like “list items terminated by” and
“map items terminated by”.
Thanks,
Rakesh
--
Yours,
Zheng