FAQ
Hi,

I've been looking over the db schema that hive uses to store it's
metadata (package.jdo) and I had some questions:

1. What do the field names in the TYPES table mean? TYPE1, TYPE2,
and TYPE_FIELDS are all unclear to me.
2. In the TBLS (tables) table, what is sd?
3. What does the SERDES table store?
4. What does the SORT_ORDER table store? It appears to describe the
ordering within a storage descriptor, which in turn appears to be
related to a partition. Do you envision having a table where different
partitions have different orders?
5. SDS (storage descriptor) table has a list of columns. Does this
imply that columnar storage is supported?
6. What is the relationship between a storage descriptor and a
partition? 1-1, 1-n?

Thanks.

Alan.

Search Discussions

  • Prasad Chakka at Oct 7, 2008 at 10:50 pm
    Hi Alan,

    The objects are very closely associated with the Thrift API objects defined
    in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains
    descriptions as to what each field is and it should most of your questions.
    ORM for this is at s/c/h/metastore/src/java/model/package.jdo.

    2) SD is storage descriptor (look at SDS table)
    3) SERDES contains information for Hive serializers and deserializers
    5) Tables and Partitions have Storage Descriptors. Storage Descriptors
    contain physical storage info and how to read the data (serde info). Storage
    Description object actually contains the columns. This means that different
    partitions can have different column sets
    6) 1-1

    Thanks,
    Prasad

    From: Alan Gates <gates@yahoo-inc.com>
    Reply-To: <core-user@hadoop.apache.org>
    Date: Tue, 7 Oct 2008 15:28:50 -0700
    To: <core-user@hadoop.apache.org>
    Subject: Questions regarding Hive metadata schema

    Hi,

    I've been looking over the db schema that hive uses to store it's
    metadata (package.jdo) and I had some questions:

    1. What do the field names in the TYPES table mean? TYPE1, TYPE2,
    and TYPE_FIELDS are all unclear to me.
    2. In the TBLS (tables) table, what is sd?
    3. What does the SERDES table store?
    4. What does the SORT_ORDER table store? It appears to describe the
    ordering within a storage descriptor, which in turn appears to be
    related to a partition. Do you envision having a table where different
    partitions have different orders?
    5. SDS (storage descriptor) table has a list of columns. Does this
    imply that columnar storage is supported?
    6. What is the relationship between a storage descriptor and a
    partition? 1-1, 1-n?

    Thanks.

    Alan.
  • Jeff Hammerbacher at Oct 8, 2008 at 12:53 am
    For translation purposes, SerDe's in Hive correspond to
    StoreFunc/LoadFunc pairs in Pig and Producers/Extractor pairs in
    SCOPE.

    I claim SCOPE's terminology is the most elegant and we should all
    standardize on their terminology, in this case at least. Joy claims
    that SerDe is a common term in the hardware community. Since Hive was
    mainly intended for hardware developers, ...wait a second, that's not
    right.

    (seriously though, we need some way to keep these things straight, and
    being able to reuse serialization/deserialization libraries would be
    nice).
    On Tue, Oct 7, 2008 at 3:49 PM, Prasad Chakka wrote:
    Hi Alan,

    The objects are very closely associated with the Thrift API objects defined
    in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains
    descriptions as to what each field is and it should most of your questions.
    ORM for this is at s/c/h/metastore/src/java/model/package.jdo.

    2) SD is storage descriptor (look at SDS table)
    3) SERDES contains information for Hive serializers and deserializers
    5) Tables and Partitions have Storage Descriptors. Storage Descriptors
    contain physical storage info and how to read the data (serde info). Storage
    Description object actually contains the columns. This means that different
    partitions can have different column sets
    6) 1-1

    Thanks,
    Prasad

    From: Alan Gates <gates@yahoo-inc.com>
    Reply-To: <core-user@hadoop.apache.org>
    Date: Tue, 7 Oct 2008 15:28:50 -0700
    To: <core-user@hadoop.apache.org>
    Subject: Questions regarding Hive metadata schema

    Hi,

    I've been looking over the db schema that hive uses to store it's
    metadata (package.jdo) and I had some questions:

    1. What do the field names in the TYPES table mean? TYPE1, TYPE2,
    and TYPE_FIELDS are all unclear to me.
    2. In the TBLS (tables) table, what is sd?
    3. What does the SERDES table store?
    4. What does the SORT_ORDER table store? It appears to describe the
    ordering within a storage descriptor, which in turn appears to be
    related to a partition. Do you envision having a table where different
    partitions have different orders?
    5. SDS (storage descriptor) table has a list of columns. Does this
    imply that columnar storage is supported?
    6. What is the relationship between a storage descriptor and a
    partition? 1-1, 1-n?

    Thanks.

    Alan.

  • Joydeep Sen Sarma at Oct 8, 2008 at 6:01 am
    There is a quite a bit of difference in the scope (no pun) of these different interfaces. The SCOPE paper says rows are sets of typed columns (and the paper's examples demo that). Hive's SerDe/ObjectInspector interfaces allow plugging in objects with arbitrary levels of nesting and map/array types. It's also fairly extensible (since it started off being modeled after Java Reflection) (adding enum types, for instance, would be pretty easy). The interface is friendly to lazy evaluation (the SCOPE one doesn't seem to be) and should eventually allow things like columnar compression/organization to be implemented transparently.

    but yeah - the naming was dead casual (aren't u glad it's not called fbxyz?) - who knew it would ever come this far.

    -----Original Message-----
    From: hive-users-bounces@publists.facebook.com On Behalf Of Jeff Hammerbacher
    Sent: Tuesday, October 07, 2008 5:53 PM
    To: core-user@hadoop.apache.org
    Cc: hive-users@publists.facebook.com
    Subject: Re: [hive-users] Questions regarding Hive metadata schema

    For translation purposes, SerDe's in Hive correspond to
    StoreFunc/LoadFunc pairs in Pig and Producers/Extractor pairs in
    SCOPE.

    I claim SCOPE's terminology is the most elegant and we should all
    standardize on their terminology, in this case at least. Joy claims
    that SerDe is a common term in the hardware community. Since Hive was
    mainly intended for hardware developers, ...wait a second, that's not
    right.

    (seriously though, we need some way to keep these things straight, and
    being able to reuse serialization/deserialization libraries would be
    nice).
    On Tue, Oct 7, 2008 at 3:49 PM, Prasad Chakka wrote:
    Hi Alan,

    The objects are very closely associated with the Thrift API objects defined
    in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains
    descriptions as to what each field is and it should most of your questions.
    ORM for this is at s/c/h/metastore/src/java/model/package.jdo.

    2) SD is storage descriptor (look at SDS table)
    3) SERDES contains information for Hive serializers and deserializers
    5) Tables and Partitions have Storage Descriptors. Storage Descriptors
    contain physical storage info and how to read the data (serde info). Storage
    Description object actually contains the columns. This means that different
    partitions can have different column sets
    6) 1-1

    Thanks,
    Prasad

    From: Alan Gates <gates@yahoo-inc.com>
    Reply-To: <core-user@hadoop.apache.org>
    Date: Tue, 7 Oct 2008 15:28:50 -0700
    To: <core-user@hadoop.apache.org>
    Subject: Questions regarding Hive metadata schema

    Hi,

    I've been looking over the db schema that hive uses to store it's
    metadata (package.jdo) and I had some questions:

    1. What do the field names in the TYPES table mean? TYPE1, TYPE2,
    and TYPE_FIELDS are all unclear to me.
    2. In the TBLS (tables) table, what is sd?
    3. What does the SERDES table store?
    4. What does the SORT_ORDER table store? It appears to describe the
    ordering within a storage descriptor, which in turn appears to be
    related to a partition. Do you envision having a table where different
    partitions have different orders?
    5. SDS (storage descriptor) table has a list of columns. Does this
    imply that columnar storage is supported?
    6. What is the relationship between a storage descriptor and a
    partition? 1-1, 1-n?

    Thanks.

    Alan.

    _______________________________________________
    hive-users mailing list
    hive-users@publists.facebook.com
    http://publists.facebook.com/mailman/listinfo/hive-users

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 7, '08 at 10:30p
activeOct 8, '08 at 6:01a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase