FAQ
Hi!

I hope this is not too newbie question, but it's driving me crazy... How do
you count the records in a relation? Like DUMP, but instead of list of
records, I would like their count.

Thanks,

Anze

## Search Discussions

•  at Oct 29, 2010 at 11:22 am ⇧
Hi,

Lets say you have a file with columns userid username location amount

To count the total number of users:
amount:long);
G = GROUP A ALL PARALLEL 40;
R = FOREACH G GENERATE COUNT(\$1);

dump R;

To count the number of users by location;

amount:long);
G = GROUP A BY location PARALLEL 40;
R = FOREACH G GENERATE FLATTEN(group), COUNT(\$1);

dump R;

To get the sum of amount per location, userid

amount:long);
G = GROUP A BY (location, userid) PARALLEL 40;
R = FOREACH G GENERATE FLATTEN(group), COUNT(\$1) as usercount,
SUM(\$1.amount) as useramount;

NOTE PARALLEL is set to 40 as an example, this should be set by you, and
depends on your cluster setup, data etc.

To count its always GROUP either ALL or BY <column name>
Then FOREACH and generate COUNT(\$1) the \$1.

Hope this helps,

-----Original Message-----
From: Anze
Sent: Friday, October 29, 2010 12:01 PM
To: user@pig.apache.org
Subject: relations count

Hi!

I hope this is not too newbie question, but it's driving me crazy... How do
you count the records in a relation? Like DUMP, but instead of list of
records, I would like their count.

Thanks,

Anze
•  at Oct 29, 2010 at 12:44 pm ⇧
Thanks, that helps a lot! :)

Anze

On Friday 29 October 2010, Gerrit Jansen van Vuuren wrote:
Hi,

Lets say you have a file with columns userid username location amount

To count the total number of users:
amount:long);
G = GROUP A ALL PARALLEL 40;
R = FOREACH G GENERATE COUNT(\$1);

dump R;

To count the number of users by location;

amount:long);
G = GROUP A BY location PARALLEL 40;
R = FOREACH G GENERATE FLATTEN(group), COUNT(\$1);

dump R;

To get the sum of amount per location, userid

amount:long);
G = GROUP A BY (location, userid) PARALLEL 40;
R = FOREACH G GENERATE FLATTEN(group), COUNT(\$1) as usercount,
SUM(\$1.amount) as useramount;

NOTE PARALLEL is set to 40 as an example, this should be set by you, and
depends on your cluster setup, data etc.

To count its always GROUP either ALL or BY <column name>
Then FOREACH and generate COUNT(\$1) the \$1.

Hope this helps,

-----Original Message-----
From: Anze
Sent: Friday, October 29, 2010 12:01 PM
To: user@pig.apache.org
Subject: relations count

Hi!

I hope this is not too newbie question, but it's driving me crazy... How do
you count the records in a relation? Like DUMP, but instead of list of
records, I would like their count.

Thanks,

Anze

## Related Discussions

Discussion Overview
 group user categories pig, hadoop posted Oct 29, '10 at 11:01a active Oct 29, '10 at 12:44p posts 3 users 2 website pig.apache.org

### 2 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase