Grokbase Groups Pig user July 2010
FAQ
This worked for me,

joined = JOIN Bytes1 BY Group1, Bytes2 BY Group2;

end = FOREACH joined GENERATE $0 AS something, $1/$3 AS ratio;

Looks similar to your situation.

-----Original Message-----
From: Brian Adams
Sent: Tuesday, July 06, 2010 6:20 PM
To: pig-user@hadoop.apache.org
Subject: RE: Sum Null Column

Matt,

Haven't tried that route yet. I can give that one a go.
On Tue, 2010-07-06 at 18:23 -0400, Matthew Smith wrote:
Shouldn't you have to do it in another table? Add one more step?

fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
webgram,$3 AS webcount;

addedStep = FOREACH fullprep $0,$1,$2,$3, SUM(fullprep.$1 + fullprep.$3) as sumCount;

Just my 2c


-----Original Message-----
From: Brian Adams
Sent: Tuesday, July 06, 2010 6:11 PM
To: pig-user@hadoop.apache.org
Subject: Re: Sum Null Column

The Current line I have is this:
fullprep = FOREACH joined GENERATE $0 AS smsgram,$1 AS smscount,$2 AS
webgram,$3 AS webcount,SUM((int)$1,(int)$3) AS sumcount;

I get the error:
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.SUM as multiple or none of them fit. Please use
an explicit cast.

I tried the conditional earlier and it didnt seem to work.
On Tue, 2010-07-06 at 15:04 -0700, Dmitriy Ryaboy wrote:
you can run a replacement loop if you want:

zeroes = foreach foo generate ( x is null ? 0 : x) as x;

or if you are dealing with empties instead of nulls
zeroes = foreach foo generate (IsEmpty(x) ? 0 : x) as x;
On Tue, Jul 6, 2010 at 2:57 PM, Wilkes, Chris wrote:

SUM($1,$3) doesn't work? If the column is a non-double then it is
converted to a 0.




On Jul 6, 2010, at 2:47 PM, Brian Adams wrote:

So I asked a question earlier, but figured it wasn't very clear and thus
less likely to get answered, so here goes.

I have 2 sources with seperate ngrams and counts and after doing a full
out join i get:

Example with numbers:
dog,10,dog,20
cat,5,null/empty,null/empty
null/empty, null/empty,mouse,10

If i want to sum up $1 with $3 the first row would give me
dog,10,dog,20,30

However, when I cat the results out, the second row looks like
cat,5,emtpy,emtpy,empty

And Lastly, the 3rd appears like
empty,empty,mouse,10,emtpy

So if there is not a count in position $1 or $3, then I cannot add them
together.

How do I replace the emtpy with a 0 so I can perform the summation in
the last column?

Desired Output:
dog,10,dog,20,30
cat,5,emtpy,0,5
empty,0,mouse,10,10

Thanks guys.



Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 9 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 6, '10 at 9:47p
activeJul 6, '10 at 10:29p
posts9
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase