|
Pete Wyckoff (JIRA) |
at Oct 30, 2008 at 6:27 pm
|
⇧ |
| |
[
https://issues.apache.org/jira/browse/HADOOP-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644065#action_12644065 ]
Pete Wyckoff commented on HADOOP-4550:
--------------------------------------
I propose
1. we add a 'skip' attribute to the field specification in the dynamicserde grammar. When this field attribute is set, DynamicSerDeFieldList will call protocol.skip for that field.
2. We add an interface for protocols, something like: TFastSkippable { void skip(type); } or maybe need skipI32, skipi64, skipString, skipList, ...
3. for TCTLSeparatedProtocol, we implement TFastSkippable
4. Modify the runtime to insert skip attributes in the runtime DDL passed to DynamicSerDe.
This will need to be prioritized with other optimizations, but for TCTLSeparatedProtocol this is certainly a performance issue and may block replacing TMetadataTypedColumnsetSerDe with DynamicSerDe since the latter is only strings and cost of not skipping is low.
Make DynamicSerDe capable of skipping fields that will not be used in the query
-------------------------------------------------------------------------------
Key: HADOOP-4550
URL:
https://issues.apache.org/jira/browse/HADOOP-4550Project: Hadoop Core
Issue Type: New Feature
Components: contrib/hive
Reporter: Pete Wyckoff
Thrift/DynamicSerDe always deseriualize and convert fields to the correct type for every field in the record. Many times, only a few of the fields will be used.
e.g., select foo.user from foo where foo.created < 'today'
where foo is something like
struct {
string user
i64 created
string fullname
string description
i32 something
i32 somethingelse
...
}
Parsing fullname, description, something and something else is a waste in this case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.