FAQ
Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered by \$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

## Search Discussions

•  at Apr 16, 2012 at 8:44 pm ⇧
Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

•  at Apr 16, 2012 at 11:04 pm ⇧
Dear Gianmarco,

It works great! Thanks.

Tim
________________________________________
From: Gianmarco De Francisci Morales [gdfm@apache.org]
Sent: Monday, April 16, 2012 1:43 PM
To: user@pig.apache.org
Subject: Re: ordering tuple after grouping

Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

•  at Apr 17, 2012 at 12:23 am ⇧
Or even:

ordered = foreach (group data by \$0) { sorted = order data by \$1; first = limit sorted 1; generate first; }

Russell Jurney http://datasyndrome.com
On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

Dear Gianmarco,

It works great! Thanks.

Tim
________________________________________
From: Gianmarco De Francisci Morales [gdfm@apache.org]
Sent: Monday, April 16, 2012 1:43 PM
To: user@pig.apache.org
Subject: Re: ordering tuple after grouping

Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

•  at Apr 17, 2012 at 7:47 am ⇧
This works, but isn't the most efficient thing in the world.
Try using the TOP udf instead.
http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
wrote:
Or even:

ordered = foreach (group data by \$0) { sorted = order data by \$1; first = limit sorted 1; generate first; }

Russell Jurney http://datasyndrome.com
On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

Dear Gianmarco,

It works great! Thanks.

Tim
________________________________________
From: Gianmarco De Francisci Morales [gdfm@apache.org]
Sent: Monday, April 16, 2012 1:43 PM
To: user@pig.apache.org
Subject: Re: ordering tuple after grouping

Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

•  at Apr 17, 2012 at 8:04 am ⇧
Hi Dmitriy,

Can you explain which is the difference in the execution plan?
And if there is a performance difference, shouldn't we try to fix it?

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy wrote:

This works, but isn't the most efficient thing in the world.
Try using the TOP udf instead.
http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
wrote:
Or even:

ordered = foreach (group data by \$0) { sorted = order data by \$1; first
= limit sorted 1; generate first; }

Russell Jurney http://datasyndrome.com
On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

Dear Gianmarco,

It works great! Thanks.

Tim
________________________________________
From: Gianmarco De Francisci Morales [gdfm@apache.org]
Sent: Monday, April 16, 2012 1:43 PM
To: user@pig.apache.org
Subject: Re: ordering tuple after grouping

Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered
by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

•  at Apr 17, 2012 at 9:51 am ⇧
Top doesn't need to sort the whole relation; it can be done in a streaming fashion over any collection (n log k, where k << n). Plus it's algebraic (associative), since top 10 of a set is top 10 of all the top 10s of a covering collection of subsets.
On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales wrote:

Hi Dmitriy,

Can you explain which is the difference in the execution plan?
And if there is a performance difference, shouldn't we try to fix it?

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy wrote:

This works, but isn't the most efficient thing in the world.
Try using the TOP udf instead.
http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
wrote:
Or even:

ordered = foreach (group data by \$0) { sorted = order data by \$1; first
= limit sorted 1; generate first; }

Russell Jurney http://datasyndrome.com
On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

Dear Gianmarco,

It works great! Thanks.

Tim
________________________________________
From: Gianmarco De Francisci Morales [gdfm@apache.org]
Sent: Monday, April 16, 2012 1:43 PM
To: user@pig.apache.org
Subject: Re: ordering tuple after grouping

Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered
by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

•  at Apr 17, 2012 at 10:53 am ⇧
You meant replacing both ORDER and LIMIT with TOP.
Makes sense, thanks.

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 11:50, Dmitriy Ryaboy wrote:

Top doesn't need to sort the whole relation; it can be done in a streaming
fashion over any collection (n log k, where k << n). Plus it's algebraic
(associative), since top 10 of a set is top 10 of all the top 10s of a
covering collection of subsets.

On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:
Hi Dmitriy,

Can you explain which is the difference in the execution plan?
And if there is a performance difference, shouldn't we try to fix it?

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy wrote:

This works, but isn't the most efficient thing in the world.
Try using the TOP udf instead.
http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html

On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
wrote:
Or even:

ordered = foreach (group data by \$0) { sorted = order data by \$1; first
= limit sorted 1; generate first; }

Russell Jurney http://datasyndrome.com
On Apr 16, 2012, at 4:03 PM, "Chan, Tim" wrote:

Dear Gianmarco,

It works great! Thanks.

Tim
________________________________________
From: Gianmarco De Francisci Morales [gdfm@apache.org]
Sent: Monday, April 16, 2012 1:43 PM
To: user@pig.apache.org
Subject: Re: ordering tuple after grouping

Sure,
use a nested foreach.

grouped = group data by \$0;
ordered = foreach grouped {
sorted = order data by \$1;
first = limit sorted 1;
generate first;
}

Beware, untested code.

Cheers,
--
Gianmarco

On Mon, Apr 16, 2012 at 22:31, Chan, Tim wrote:

Given data:

(1, 55, abc)
(2, 23, asd)
(1, 85, xyz)
(1, 2, aaa)

I would like to group on \$0 and then have my grouped tuple be ordered
by
\$1. Is this possible?

The output should look like this:

(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
(2, {(2,23,asd)})

Then I would like to keep the first tuple for every group.

For example:

(1,2,aaa)
(2,23,asd)

## Related Discussions

Discussion Overview
 group user categories pig, hadoop posted Apr 16, '12 at 8:31p active Apr 17, '12 at 10:53a posts 8 users 4 website pig.apache.org

### 4 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase