Grokbase Groups Pig user October 2010
FAQ
Hey everyone!
In the Pig Journal page (http://wiki.apache.org/pig/PigJournal) says
something about getting statistics for Pig's optimizer. Is there any work
being done on that?
Or are there any other plans to improve the optimizer? I mean now is a rule
based one, are there expectations to change it to a cost based one?
Any opinions or comments are highly appreciated. Thanks!


Renato M.

Search Discussions

  • Alan Gates at Oct 14, 2010 at 6:02 pm
    AFAIK no one is working on that currently. Our next thoughts on
    optimizer improvement were to start using the new optimizer framework
    in the MR optimizer so we can bring some order to the madness of
    visitors that is the MR optimizer. I think Thejas plans on starting
    work on that in 0.9.

    In the long run many people have discussed having a cost based
    optimizer, but I have not seen any proposals of how it should work.
    With the advent of Howl it would seem that at least basic statistic
    based decisions could be made in the optimizer: is one of the files
    small enough to use a replicated join in this case? is the file
    already sorted so we can use a merge join?.

    It would be great if you're interested in working in this area. The
    best way to start is file a JIRA or start a wiki page with general
    approach and design information.

    Alan.
    On Oct 13, 2010, at 7:46 PM, Renato Marroquín Mogrovejo wrote:

    Hey everyone!
    In the Pig Journal page (http://wiki.apache.org/pig/PigJournal) says
    something about getting statistics for Pig's optimizer. Is there any
    work
    being done on that?
    Or are there any other plans to improve the optimizer? I mean now is
    a rule
    based one, are there expectations to change it to a cost based one?
    Any opinions or comments are highly appreciated. Thanks!


    Renato M.
  • Dmitriy Ryaboy at Oct 14, 2010 at 6:32 pm
    As long as the loader can provide statistics (be it Howl loader, or
    something else), we can use stats to make optimizations. There is actually a
    (very simplistic) example of this already in 0.8, in that it will estimate
    number of reducers based on file size if parallelism is not specified for
    join/group/etc operations. It's not using LoadStatistics afaik, that should
    be fixed.

    Ashutosh and I did some work around this about a year ago (on top of the
    visitor madness) that was never really presentable.. would be happy to dig
    up the code and see if any of it is still salvageable.

    -D
    On Thu, Oct 14, 2010 at 11:01 AM, Alan Gates wrote:

    AFAIK no one is working on that currently. Our next thoughts on optimizer
    improvement were to start using the new optimizer framework in the MR
    optimizer so we can bring some order to the madness of visitors that is the
    MR optimizer. I think Thejas plans on starting work on that in 0.9.

    In the long run many people have discussed having a cost based optimizer,
    but I have not seen any proposals of how it should work. With the advent of
    Howl it would seem that at least basic statistic based decisions could be
    made in the optimizer: is one of the files small enough to use a replicated
    join in this case? is the file already sorted so we can use a merge join?.

    It would be great if you're interested in working in this area. The best
    way to start is file a JIRA or start a wiki page with general approach and
    design information.

    Alan.


    On Oct 13, 2010, at 7:46 PM, Renato Marroquín Mogrovejo wrote:

    Hey everyone!
    In the Pig Journal page (http://wiki.apache.org/pig/PigJournal) says
    something about getting statistics for Pig's optimizer. Is there any work
    being done on that?
    Or are there any other plans to improve the optimizer? I mean now is a
    rule
    based one, are there expectations to change it to a cost based one?
    Any opinions or comments are highly appreciated. Thanks!


    Renato M.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 14, '10 at 2:47a
activeOct 14, '10 at 6:32p
posts3
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase