Grokbase Groups Pig user August 2011
FAQ
I'm reading the documentation and it says:

"*Regular Join Optimizations*

Optimization for regular joins ensures that the last table in the join is
not brought into memory but streamed through instead. Optimization reduces
the amount of memory used which means you can avoid spilling the data and
also should be able to scale your query to larger data volumes.

To take advantage of this optimization, make sure that the table with the
largest number of tuples per key is the last table in your query. In some of
our tests we saw 10x performance improvement as the result of this
optimization.".


This seems like it would apply to cogroup too…… does it?

--

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Search Discussions

  • Daniel Dai at Aug 28, 2011 at 7:28 pm
    Yes. But only in the case you immediately flatten the rightmost
    relation after cogroup. Otherwise, bag will be created.

    Daniel
    On Sun, Aug 28, 2011 at 12:11 PM, Kevin Burton wrote:
    I'm reading the documentation and it says:

    "*Regular Join Optimizations*

    Optimization for regular joins ensures that the last table in the join is
    not brought into memory but streamed through instead. Optimization reduces
    the amount of memory used which means you can avoid spilling the data and
    also should be able to scale your query to larger data volumes.

    To take advantage of this optimization, make sure that the table with the
    largest number of tuples per key is the last table in your query. In some of
    our tests we saw 10x performance improvement as the result of this
    optimization.".


    This seems like it would apply to cogroup too…… does it?

    --

    Founder/CEO Spinn3r.com

    Location: *San Francisco, CA*
    Skype: *burtonator*

    Skype-in: *(415) 871-0687*
  • Kevin Burton at Aug 29, 2011 at 6:51 am
    The documentation should probably be updated to reflect this … I guess I
    should probably shut up and submit a patch :-P



    I do have to admit that as an outside contributor having a read only git is
    really sweet. I'm too addicted to our internal branching using distributed
    reversion control to want to work in an OSS project without it :-P
    On Sun, Aug 28, 2011 at 12:27 PM, Daniel Dai wrote:

    Yes. But only in the case you immediately flatten the rightmost
    relation after cogroup. Otherwise, bag will be created.

    Daniel
    On Sun, Aug 28, 2011 at 12:11 PM, Kevin Burton wrote:
    I'm reading the documentation and it says:

    "*Regular Join Optimizations*

    Optimization for regular joins ensures that the last table in the join is
    not brought into memory but streamed through instead. Optimization reduces
    the amount of memory used which means you can avoid spilling the data and
    also should be able to scale your query to larger data volumes.

    To take advantage of this optimization, make sure that the table with the
    largest number of tuples per key is the last table in your query. In some of
    our tests we saw 10x performance improvement as the result of this
    optimization.".


    This seems like it would apply to cogroup too…… does it?

    --

    Founder/CEO Spinn3r.com

    Location: *San Francisco, CA*
    Skype: *burtonator*

    Skype-in: *(415) 871-0687*


    --

    Founder/CEO Spinn3r.com

    Location: *San Francisco, CA*
    Skype: *burtonator*

    Skype-in: *(415) 871-0687*
  • Dmitriy Ryaboy at Aug 29, 2011 at 9:43 pm
    Just make sure to use "git diff --no-prefix" to generate you patch when you
    upload it to the Jira. I work off the apache git mirror, as well.

    D
    On Sun, Aug 28, 2011 at 11:49 PM, Kevin Burton wrote:

    The documentation should probably be updated to reflect this … I guess I
    should probably shut up and submit a patch :-P



    I do have to admit that as an outside contributor having a read only git is
    really sweet. I'm too addicted to our internal branching using distributed
    reversion control to want to work in an OSS project without it :-P
    On Sun, Aug 28, 2011 at 12:27 PM, Daniel Dai wrote:

    Yes. But only in the case you immediately flatten the rightmost
    relation after cogroup. Otherwise, bag will be created.

    Daniel
    On Sun, Aug 28, 2011 at 12:11 PM, Kevin Burton wrote:
    I'm reading the documentation and it says:

    "*Regular Join Optimizations*

    Optimization for regular joins ensures that the last table in the join
    is
    not brought into memory but streamed through instead. Optimization reduces
    the amount of memory used which means you can avoid spilling the data
    and
    also should be able to scale your query to larger data volumes.

    To take advantage of this optimization, make sure that the table with
    the
    largest number of tuples per key is the last table in your query. In
    some
    of
    our tests we saw 10x performance improvement as the result of this
    optimization.".


    This seems like it would apply to cogroup too…… does it?

    --

    Founder/CEO Spinn3r.com

    Location: *San Francisco, CA*
    Skype: *burtonator*

    Skype-in: *(415) 871-0687*


    --

    Founder/CEO Spinn3r.com

    Location: *San Francisco, CA*
    Skype: *burtonator*

    Skype-in: *(415) 871-0687*

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 28, '11 at 7:12p
activeAug 29, '11 at 9:43p
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase