FAQ
Hi,



As you know, a lot of work this year went into performance optimization
of Pig. One of the main sources of performance problems is high memory
usage. In an effort to address this problem we propose switching
internal implementation of strings from Java Strings to Hadoop Text
because text has lower memory overhead. Examples (assumes ASCII data;
sizes are in bytes):



Real String Java String Hadoop Text

5 46 37

10 56 42

20 76 52

40 116 72

80 196 112



As the size of the strings grows so does the gap between the two
implementations.



Making this change would have no impact on pig users; however, it will
have impact on existing UDFs that work with Strings. Our question is
whether UDF writers/owners are comfortable with the proposed transition
and will update their UDFs.



Please, let us know by the end of next week if you strongly object to
this proposal. Otherwise, we will go forward with this plan.



Thanks,



Olga

Search Discussions

  • Olga Natkovich at Sep 21, 2009 at 8:01 pm
    Since we have not heard any objections, we are going to proceed with
    this plan. Stay tuned for the details when the change is coming.

    Olga

    -----Original Message-----
    From: Olga Natkovich
    Sent: Friday, September 11, 2009 11:54 AM
    To: pig-dev@hadoop.apache.org; pig-user@hadoop.apache.org
    Subject: proposed changes to Pig UDFs

    Hi,



    As you know, a lot of work this year went into performance optimization
    of Pig. One of the main sources of performance problems is high memory
    usage. In an effort to address this problem we propose switching
    internal implementation of strings from Java Strings to Hadoop Text
    because text has lower memory overhead. Examples (assumes ASCII data;
    sizes are in bytes):



    Real String Java String Hadoop Text

    5 46 37

    10 56 42

    20 76 52

    40 116 72

    80 196 112



    As the size of the strings grows so does the gap between the two
    implementations.



    Making this change would have no impact on pig users; however, it will
    have impact on existing UDFs that work with Strings. Our question is
    whether UDF writers/owners are comfortable with the proposed transition
    and will update their UDFs.



    Please, let us know by the end of next week if you strongly object to
    this proposal. Otherwise, we will go forward with this plan.



    Thanks,



    Olga

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 11, '09 at 6:56p
activeSep 21, '09 at 8:01p
posts2
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich: 2 posts

People

Translate

site design / logo © 2021 Grokbase