Grokbase Groups Pig user March 2009
FAQ
Hello Pig list,

I have looked at the 'distinct' keyword but it does not seem to operate on a particular fields (columns). I have a file with several categorical variables a1-a3 and am seeking to compute distinct counts of fields a2 and a3 by field a1.

How can I get distinct counts?

For example:
A = load 'test.csv' using PigStorage(',') as (a1,a2,a3);
/*
dump A;
(x, X, a)
(y, Y, b)
(x, XX, b)
(z, Z, c)
(w, X, )
(, W, d)
(x, , b)
*/

B = group A by $0;
/*
dump B;
(, {(, W, d)})
(w, {(w, X, )})
(x, {(x, X, a), (x, XX, b), (x, , b)})
(y, {(y, Y, b)})
(z, {(z, Z, c)})
*/


# how do I get distinct counts by $0 ??
#Desired output:
,1,1
w,1,1
x,3,2
y,1,1
z,1,1


Many thanks,
Avram

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 18, '09 at 8:44p
activeMar 18, '09 at 11:30p
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase