Grokbase Groups Pig user March 2010
FAQ
Hi,

I wonder if it is faster to firstly extract only the interesting
fiels from a bag of tuples before performing other operations on it,
or if it is automatically handled by the optimizer:

For exemple, is:

ssessions = FOREACH sessions GENERATE imei;
imei_sessions = GROUP ssessions BY imei;
imei_session_count = FOREACH imei_sessions GENERATE group,
COUNT(ssessions);

faster than:

imei_sessions = GROUP sessions BY imei;
imei_session_count = FOREACH imei_sessions GENERATE group,
COUNT(sessions);

Thanks for your help

Search Discussions

  • Jianyong Dai at Mar 18, 2010 at 10:30 pm
    For bag, you need to project it manually. Current optimization does not
    handle pruning of fields inside a bag. Once you group it as a bag, all
    the fields inside the bag will be marked as required. So, #1 is faster
    than #2.

    Daniel

    Vincent Barat wrote:
    Hi,

    I wonder if it is faster to firstly extract only the interesting
    fiels from a bag of tuples before performing other operations on it,
    or if it is automatically handled by the optimizer:

    For exemple, is:

    ssessions = FOREACH sessions GENERATE imei;
    imei_sessions = GROUP ssessions BY imei;
    imei_session_count = FOREACH imei_sessions GENERATE group,
    COUNT(ssessions);

    faster than:

    imei_sessions = GROUP sessions BY imei;
    imei_session_count = FOREACH imei_sessions GENERATE group,
    COUNT(sessions);

    Thanks for your help

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 18, '10 at 10:23p
activeMar 18, '10 at 10:30p
posts2
users2
websitepig.apache.org

2 users in discussion

Vincent Barat: 1 post Jianyong Dai: 1 post

People

Translate

site design / logo © 2021 Grokbase