FAQ
I am using PIG and this is what I am trying to do:

1) Sort a relation A into B by a field x. The smallest value of x is first.
Just use SORT.

2) Label each tuple in B with a number denoting its order in the sorted
relation. So the first tuple would be labeled with a 1, the second tuple
with a 2, the third with a 3 and so on. Not certain how to do this.

3) Derive a relation C where each row is a bag of tuples. The first row
contains the first n1 tuples from relation B, the second row contains the
tuples from B labeled (n1 + 1) to n2 from, the third row contains the tuples
from B labeled (n2 + 1) to n3 and so on to n100. This step is simple (just
use filter) once we've labeled each tuple in B with a number.

The question: how do I do step 2).

thanks
--
View this message in context: http://old.nabble.com/PIG-bin-labeling-relation-tp26443615p26443615.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

  • Dmitriy Ryaboy at Nov 21, 2009 at 8:04 pm
    Unless you actually need the ordinal numbers, you can do it all in one step:
    B = ORDER A by x PARALLEL 100;
    Store B into ......

    This will create 100 ordered part files, with the first part file
    containing the first 100th of the data, the second -- the next 100th,
    and so on. The fragments are approximate in size, so some may be
    slightly bigger than others, but for a big enough dataset, they should
    be roughly equal.

    -D
    On Fri, Nov 20, 2009 at 1:18 PM, drd_ wrote:

    I am using PIG and this is what I am trying to do:

    1) Sort a relation A into B by a field x. The smallest value of x is first.
    Just use SORT.

    2) Label each tuple in B with a number denoting its order in the sorted
    relation. So the first tuple would be labeled with a 1, the second tuple
    with a 2, the third with a 3 and so on. Not certain how to do this.

    3) Derive a relation C where each row is a bag of tuples. The first row
    contains the first n1 tuples from relation B, the second row contains the
    tuples from B labeled (n1 + 1) to n2 from, the third row contains the tuples
    from B labeled (n2 + 1) to n3 and so on to n100. This step is simple (just
    use filter) once we've labeled each tuple in B with a number.

    The question: how do I do step 2).

    thanks
    --
    View this message in context: http://old.nabble.com/PIG-bin-labeling-relation-tp26443615p26443615.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 20, '09 at 6:18p
activeNov 21, '09 at 8:04p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Drd_: 1 post Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2022 Grokbase