Grokbase Groups Pig user April 2010
FAQ
Thanks. I did so, but I probably did it wrong. Couldn't make it work.
On Fri, Apr 2, 2010 at 1:49 PM, hc busy wrote:

.... yeah, you have to implement outputSchema() method on the udf in order
to make the content of the tuple visible... There's a nice example in the
UDF Manual

http://hadoop.apache.org/pig/docs/r0.6.0/udf.html

<http://hadoop.apache.org/pig/docs/r0.6.0/udf.html>search for 'package
myudf' until u find it.



On Fri, Apr 2, 2010 at 12:52 PM, Russell Jurney <russell.jurney@gmail.com
wrote:
Not sure if this is exactly the same, but when I've created tuples within
tuples in UDFs (to preserve order of pairs), from bag input, Pig has
allowed
it - but I can't work with that data in subsequent steps.
On Fri, Apr 2, 2010 at 12:37 PM, hc busy wrote:

Yeah, I'm sure it has nested tuples. Pig doesn't natively support
introduction of tuples

h = foreach g generate ((x,y,z)), (x), ((((x))))

doesn't work, but i have a udf that does that.... don't ask why....,
and
I've seen it print double pair of paren's when I took a dump.

Our hadoop guys here says it's CDH2 and that the "upgrade" was just
re-installation of CDH2... ("same jars") But certainly my script
suddenly
started doing weird things when it flattened that all the way through.

I'd support the prior behavior as well, because that seems to match my
reading of documentation on behavior of FLATTEN.



Has anybody else had this problem with recent cloudera/pig versions?


thnx!!


On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman <zaki.rahaman@gmail.com
wrote:
Stupid question but are you sure your bag has the dual sets of
parentheses?
(And if I may ask, why is that the case?)

On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman <zaki.rahaman@gmail.com
wrote:
If I'm not mistaken, the output is the expected behavior. Flatten
should
unnest bags. I'm assuming your statement is something like FOREACH
...
GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the
first
two
fields of a tuple for every tuple in the nested bag.



On Fri, Apr 2, 2010 at 2:02 PM, hc busy wrote:

doh!!!! s/map/bag/g

I seem to get maps and bags mixed up or some reason...

Guys, I have a row containing a *bag*

'id','data', {((1,2)), ((2,3)), ((4,5))}

What is the expected behavior when I flatten on that bag? I had
expected
it
to result in

'id','data', (1,2)
'id','data', (2,3)
'id','data', (4,5)


But it appears to me that the result of applying FLATTEN to that
bag
is
this
instead:

'id','data', 1,2
'id','data', 2,3
'id','data', 4,5


The latter is returned by the current cloudera's CDH2 and I've
seen
the
prior behavior on other versions of pig.

Which is the correct behavior by design?

What will pig 0.6 do when it is released?

thanks!
On Fri, Apr 2, 2010 at 11:29 AM, hc busy wrote:

Guys, I have a row containing a map

'id','data', {((1,2)), ((2,3)), ((4,5))}

What is the expected behavior when I flatten on that bag? I had
expected
it
to result in

'id','data', (1,2)
'id','data', (2,3)
'id','data', (4,5)


But it appears to me that the result of applying FLATTEN to that
bag
is
this instead:

'id','data', 1,2
'id','data', 2,3
'id','data', 4,5


The latter is returned by the current cloudera's CDH2 and I've
seen
the
prior behavior on other versions of pig.

Which is the correct behavior by design?

What will pig 0.6 do when it is released?

thanks!


--
Zaki Rahaman

--
Zaki Rahaman

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 9 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 2, '10 at 6:30p
activeApr 2, '10 at 10:30p
posts9
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase