Grokbase Groups Pig user April 2012
FAQ
Hi,

I need to divide a large bag into 10 smaller bags of equal size. Does
anyone know of a function that can do this easily? I've had a look at the
standard functions and the PiggyBank and can't find anything appropriate.

Thanks,
James

Search Discussions

  • Prashant Kommireddi at Apr 11, 2012 at 4:09 pm
    James, you would have to write a UDF for this.
    On Apr 11, 2012, at 8:53 AM, James Newhaven wrote:

    Hi,

    I need to divide a large bag into 10 smaller bags of equal size. Does
    anyone know of a function that can do this easily? I've had a look at the
    standard functions and the PiggyBank and can't find anything appropriate.

    Thanks,
    James
  • Dan Feldman at Apr 11, 2012 at 4:11 pm
    Hey James,

    Have you looked at linkedIn's collection of UDFs, datafu (
    http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs
    )?

    In particular, they have a UDF called BagSplit (
    https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java).
    It might not do exactly what you want since it splits a bag into bags of
    size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
    your own UDF using BagSplit.java as a reference.

    Dan F.


    On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven wrote:

    Hi,

    I need to divide a large bag into 10 smaller bags of equal size. Does
    anyone know of a function that can do this easily? I've had a look at the
    standard functions and the PiggyBank and can't find anything appropriate.

    Thanks,
    James
  • James Newhaven at Apr 11, 2012 at 4:21 pm
    Hi Dan,

    Thanks for the recommendation. I did manage to use BagSplit, but does
    anyone know the best way of accessing the result returned by BagSplit?

    BagSplit returns a bag of bags. What is the best pig latin to access a bag
    inside another bag?

    When I do a describe on what it returned by BagSplit, I get:

    {datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount:
    long)})}}

    Thanks,
    James
    On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman wrote:

    Hey James,

    Have you looked at linkedIn's collection of UDFs, datafu (

    http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs
    )?

    In particular, they have a UDF called BagSplit (

    https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java
    ).
    It might not do exactly what you want since it splits a bag into bags of
    size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
    your own UDF using BagSplit.java as a reference.

    Dan F.



    On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven <james.newhaven@gmail.com
    wrote:
    Hi,

    I need to divide a large bag into 10 smaller bags of equal size. Does
    anyone know of a function that can do this easily? I've had a look at the
    standard functions and the PiggyBank and can't find anything appropriate.

    Thanks,
    James
  • Russell Jurney at Apr 11, 2012 at 7:57 pm
    If you calculate the size of the bags, you can use this value as a
    scalar and divide it by the number of bags you want, and round.

    Don't ask me to write that code though :)

    Russell Jurney
    twitter.com/rjurney
    russell.jurney@gmail.com
    datasyndrome.com
    On Apr 11, 2012, at 9:11 AM, Dan Feldman wrote:

    Hey James,

    Have you looked at linkedIn's collection of UDFs, datafu (
    http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs
    )?

    In particular, they have a UDF called BagSplit (
    https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java).
    It might not do exactly what you want since it splits a bag into bags of
    size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
    your own UDF using BagSplit.java as a reference.

    Dan F.


    On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven wrote:

    Hi,

    I need to divide a large bag into 10 smaller bags of equal size. Does
    anyone know of a function that can do this easily? I've had a look at the
    standard functions and the PiggyBank and can't find anything appropriate.

    Thanks,
    James

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 11, '12 at 3:53p
activeApr 11, '12 at 7:57p
posts5
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase