Grokbase Groups Pig dev February 2009
FAQ
PERFORMANCE: improve how data is stored between M-R jobs and between Map and Reduce
-----------------------------------------------------------------------------------

Key: PIG-686
URL: https://issues.apache.org/jira/browse/PIG-686
Project: Pig
Issue Type: Improvement
Affects Versions: types_branch
Reporter: Olga Natkovich
Fix For: types_branch


Currently, there is quite a bit of overhead in how the data is serialized in both cases because a type information is stored with each field.

However, most of the time the data has known and consistent schema in which case, it is sufficient to store the schema once.

This change could really decrease the ammount of intermediate data generated.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Jun 12, 2009 at 12:40 am
    [ https://issues.apache.org/jira/browse/PIG-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich resolved PIG-686.
    --------------------------------

    Resolution: Won't Fix

    We have experimented with this work and the performance gains (at most 5-7%) are not sufficient for the complexity it would add to the code. Hopefully, once we integrate with AVRO, we get the improvement.
    PERFORMANCE: improve how data is stored between M-R jobs and between Map and Reduce
    -----------------------------------------------------------------------------------

    Key: PIG-686
    URL: https://issues.apache.org/jira/browse/PIG-686
    Project: Pig
    Issue Type: Improvement
    Affects Versions: 0.2.0
    Reporter: Olga Natkovich

    Currently, there is quite a bit of overhead in how the data is serialized in both cases because a type information is stored with each field.
    However, most of the time the data has known and consistent schema in which case, it is sufficient to store the schema once.
    This change could really decrease the ammount of intermediate data generated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedFeb 26, '09 at 7:57p
activeJun 12, '09 at 12:40a
posts2
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 2 posts

People

Translate

site design / logo © 2022 Grokbase