FAQ
Make DynamicSerDe capable of skipping fields that will not be used in the query
-------------------------------------------------------------------------------

Key: HADOOP-4550
URL: https://issues.apache.org/jira/browse/HADOOP-4550
Project: Hadoop Core
Issue Type: New Feature
Components: contrib/hive
Reporter: Pete Wyckoff


Thrift/DynamicSerDe always deseriualize and convert fields to the correct type for every field in the record. Many times, only a few of the fields will be used.

e.g., select foo.user from foo where foo.created < 'today'

where foo is something like

struct {
string user
i64 created
string fullname
string description
i32 something
i32 somethingelse
...

}

Parsing fullname, description, something and something else is a waste in this case.




--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Pete Wyckoff (JIRA) at Oct 30, 2008 at 6:27 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644065#action_12644065 ]

    Pete Wyckoff commented on HADOOP-4550:
    --------------------------------------

    I propose

    1. we add a 'skip' attribute to the field specification in the dynamicserde grammar. When this field attribute is set, DynamicSerDeFieldList will call protocol.skip for that field.
    2. We add an interface for protocols, something like: TFastSkippable { void skip(type); } or maybe need skipI32, skipi64, skipString, skipList, ...
    3. for TCTLSeparatedProtocol, we implement TFastSkippable
    4. Modify the runtime to insert skip attributes in the runtime DDL passed to DynamicSerDe.

    This will need to be prioritized with other optimizations, but for TCTLSeparatedProtocol this is certainly a performance issue and may block replacing TMetadataTypedColumnsetSerDe with DynamicSerDe since the latter is only strings and cost of not skipping is low.

    Make DynamicSerDe capable of skipping fields that will not be used in the query
    -------------------------------------------------------------------------------

    Key: HADOOP-4550
    URL: https://issues.apache.org/jira/browse/HADOOP-4550
    Project: Hadoop Core
    Issue Type: New Feature
    Components: contrib/hive
    Reporter: Pete Wyckoff

    Thrift/DynamicSerDe always deseriualize and convert fields to the correct type for every field in the record. Many times, only a few of the fields will be used.
    e.g., select foo.user from foo where foo.created < 'today'
    where foo is something like
    struct {
    string user
    i64 created
    string fullname
    string description
    i32 something
    i32 somethingelse
    ...
    }
    Parsing fullname, description, something and something else is a waste in this case.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 30, '08 at 6:21p
activeOct 30, '08 at 6:27p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Pete Wyckoff (JIRA): 2 posts

People

Translate

site design / logo © 2022 Grokbase