FAQ
Hi,

I need to divide a large bag into 10 smaller bags of equal size. Does
anyone know of a function that can do this easily? I've had a look at the
standard functions and the PiggyBank and can't find anything appropriate.

Thanks,
James

Search Discussions

•  at Apr 11, 2012 at 4:09 pm ⇧
James, you would have to write a UDF for this.
On Apr 11, 2012, at 8:53 AM, James Newhaven wrote:

Hi,

I need to divide a large bag into 10 smaller bags of equal size. Does
anyone know of a function that can do this easily? I've had a look at the
standard functions and the PiggyBank and can't find anything appropriate.

Thanks,
James
•  at Apr 11, 2012 at 4:11 pm ⇧
Hey James,

Have you looked at linkedIn's collection of UDFs, datafu (
)?

In particular, they have a UDF called BagSplit (
It might not do exactly what you want since it splits a bag into bags of
size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
your own UDF using BagSplit.java as a reference.

Dan F.

On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven wrote:

Hi,

I need to divide a large bag into 10 smaller bags of equal size. Does
anyone know of a function that can do this easily? I've had a look at the
standard functions and the PiggyBank and can't find anything appropriate.

Thanks,
James
•  at Apr 11, 2012 at 4:21 pm ⇧
Hi Dan,

Thanks for the recommendation. I did manage to use BagSplit, but does
anyone know the best way of accessing the result returned by BagSplit?

BagSplit returns a bag of bags. What is the best pig latin to access a bag
inside another bag?

When I do a describe on what it returned by BagSplit, I get:

{datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount:
long)})}}

Thanks,
James
On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman wrote:

Hey James,

Have you looked at linkedIn's collection of UDFs, datafu (

)?

In particular, they have a UDF called BagSplit (

).
It might not do exactly what you want since it splits a bag into bags of
size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
your own UDF using BagSplit.java as a reference.

Dan F.

On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven <james.newhaven@gmail.com
wrote:
Hi,

I need to divide a large bag into 10 smaller bags of equal size. Does
anyone know of a function that can do this easily? I've had a look at the
standard functions and the PiggyBank and can't find anything appropriate.

Thanks,
James
•  at Apr 11, 2012 at 7:57 pm ⇧
If you calculate the size of the bags, you can use this value as a
scalar and divide it by the number of bags you want, and round.

Don't ask me to write that code though :)

Russell Jurney
russell.jurney@gmail.com
datasyndrome.com
On Apr 11, 2012, at 9:11 AM, Dan Feldman wrote:

Hey James,

Have you looked at linkedIn's collection of UDFs, datafu (
)?

In particular, they have a UDF called BagSplit (
It might not do exactly what you want since it splits a bag into bags of
size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
your own UDF using BagSplit.java as a reference.

Dan F.

On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven wrote:

Hi,

I need to divide a large bag into 10 smaller bags of equal size. Does
anyone know of a function that can do this easily? I've had a look at the
standard functions and the PiggyBank and can't find anything appropriate.

Thanks,
James

Related Discussions

Discussion Overview
 group user categories pig, hadoop posted Apr 11, '12 at 3:53p active Apr 11, '12 at 7:57p posts 5 users 4 website pig.apache.org

4 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase