Grokbase Groups Pig user April 2012
FAQ
Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)


I would like to group on $0 and then have my grouped tuple be ordered by $1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})


Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

Search Discussions

  • Gianmarco De Francisci Morales at Apr 16, 2012 at 8:44 pm
    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

  • Chan, Tim at Apr 16, 2012 at 11:04 pm
    Dear Gianmarco,

    It works great! Thanks.

    Tim
    ________________________________________
    From: Gianmarco De Francisci Morales [gdfm@apache.org]
    Sent: Monday, April 16, 2012 1:43 PM
    To: user@pig.apache.org
    Subject: Re: ordering tuple after grouping

    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

  • Russell Jurney at Apr 17, 2012 at 12:23 am
    Or even:

    ordered = foreach (group data by $0) { sorted = order data by $1; first = limit sorted 1; generate first; }


    Russell Jurney http://datasyndrome.com
    On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

    Dear Gianmarco,

    It works great! Thanks.

    Tim
    ________________________________________
    From: Gianmarco De Francisci Morales [gdfm@apache.org]
    Sent: Monday, April 16, 2012 1:43 PM
    To: user@pig.apache.org
    Subject: Re: ordering tuple after grouping

    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

  • Dmitriy Ryaboy at Apr 17, 2012 at 7:47 am
    This works, but isn't the most efficient thing in the world.
    Try using the TOP udf instead.
    http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

    On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
    wrote:
    Or even:

    ordered = foreach (group data by $0) { sorted = order data by $1; first = limit sorted 1; generate first; }


    Russell Jurney http://datasyndrome.com
    On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

    Dear Gianmarco,

    It works great! Thanks.

    Tim
    ________________________________________
    From: Gianmarco De Francisci Morales [gdfm@apache.org]
    Sent: Monday, April 16, 2012 1:43 PM
    To: user@pig.apache.org
    Subject: Re: ordering tuple after grouping

    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

  • Gianmarco De Francisci Morales at Apr 17, 2012 at 8:04 am
    Hi Dmitriy,

    Can you explain which is the difference in the execution plan?
    And if there is a performance difference, shouldn't we try to fix it?

    Cheers,
    --
    Gianmarco


    On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy wrote:

    This works, but isn't the most efficient thing in the world.
    Try using the TOP udf instead.
    http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

    On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
    wrote:
    Or even:

    ordered = foreach (group data by $0) { sorted = order data by $1; first
    = limit sorted 1; generate first; }

    Russell Jurney http://datasyndrome.com
    On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

    Dear Gianmarco,

    It works great! Thanks.

    Tim
    ________________________________________
    From: Gianmarco De Francisci Morales [gdfm@apache.org]
    Sent: Monday, April 16, 2012 1:43 PM
    To: user@pig.apache.org
    Subject: Re: ordering tuple after grouping

    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered
    by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

  • Dmitriy Ryaboy at Apr 17, 2012 at 9:51 am
    Top doesn't need to sort the whole relation; it can be done in a streaming fashion over any collection (n log k, where k << n). Plus it's algebraic (associative), since top 10 of a set is top 10 of all the top 10s of a covering collection of subsets.
    On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales wrote:

    Hi Dmitriy,

    Can you explain which is the difference in the execution plan?
    And if there is a performance difference, shouldn't we try to fix it?

    Cheers,
    --
    Gianmarco


    On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy wrote:

    This works, but isn't the most efficient thing in the world.
    Try using the TOP udf instead.
    http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

    On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
    wrote:
    Or even:

    ordered = foreach (group data by $0) { sorted = order data by $1; first
    = limit sorted 1; generate first; }

    Russell Jurney http://datasyndrome.com
    On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

    Dear Gianmarco,

    It works great! Thanks.

    Tim
    ________________________________________
    From: Gianmarco De Francisci Morales [gdfm@apache.org]
    Sent: Monday, April 16, 2012 1:43 PM
    To: user@pig.apache.org
    Subject: Re: ordering tuple after grouping

    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered
    by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

  • Gianmarco De Francisci Morales at Apr 17, 2012 at 10:53 am
    I see, I hadn't got your suggestion.
    You meant replacing both ORDER and LIMIT with TOP.
    Makes sense, thanks.

    Cheers,
    --
    Gianmarco


    On Tue, Apr 17, 2012 at 11:50, Dmitriy Ryaboy wrote:

    Top doesn't need to sort the whole relation; it can be done in a streaming
    fashion over any collection (n log k, where k << n). Plus it's algebraic
    (associative), since top 10 of a set is top 10 of all the top 10s of a
    covering collection of subsets.

    On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <
    gdfm@apache.org> wrote:
    Hi Dmitriy,

    Can you explain which is the difference in the execution plan?
    And if there is a performance difference, shouldn't we try to fix it?

    Cheers,
    --
    Gianmarco


    On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy wrote:

    This works, but isn't the most efficient thing in the world.
    Try using the TOP udf instead.
    http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

    On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
    wrote:
    Or even:

    ordered = foreach (group data by $0) { sorted = order data by $1; first
    = limit sorted 1; generate first; }

    Russell Jurney http://datasyndrome.com
    On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

    Dear Gianmarco,

    It works great! Thanks.

    Tim
    ________________________________________
    From: Gianmarco De Francisci Morales [gdfm@apache.org]
    Sent: Monday, April 16, 2012 1:43 PM
    To: user@pig.apache.org
    Subject: Re: ordering tuple after grouping

    Sure,
    use a nested foreach.

    grouped = group data by $0;
    ordered = foreach grouped {
    sorted = order data by $1;
    first = limit sorted 1;
    generate first;
    }

    Beware, untested code.

    Cheers,
    --
    Gianmarco


    On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

    Given data:

    (1, 55, abc)
    (2, 23, asd)
    (1, 85, xyz)
    (1, 2, aaa)


    I would like to group on $0 and then have my grouped tuple be ordered
    by
    $1. Is this possible?

    The output should look like this:

    (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
    (2, {(2,23,asd)})


    Then I would like to keep the first tuple for every group.

    For example:

    (1,2,aaa)
    (2,23,asd)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 16, '12 at 8:31p
activeApr 17, '12 at 10:53a
posts8
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase