FAQ
Hi,

I am running a simple invert index generating program in hadoop which will emit every word in a text file as well as it's offsets.
So the output key is Text and output value is a list of LongWritable.

What I am trying to do is sort the offsets in reduce function. For each key, I put every value into a List and sort using Collections.sort().

This is the code sanp:
offsetList.clear();
for (LongWritable val : values)
{
offsetList.add(val);
}
Collections.sort(offsetList);


for (LongWritable offset : offsetList)
{
......
}

But it doesn't work. Looks like all the elements in offsetList have been overwritten by the smallest value in values. offsetList and values have the same size.
Can I sort the data in this way?

Thanks.

Search Discussions

  • Harsh J at Jan 30, 2011 at 10:43 am
    The reduce's value iterator gives you a reference to a single object
    that's utilized across the reduce calls. If you must build an entire
    collection in memory to sort (You could explore how MapReduce itself
    can help sort with comparators/groupers, which is more efficient), use
    the clone() method of the value object to get a valid reference to
    hold in a list.
    On Sun, Jan 30, 2011 at 3:36 PM, exception wrote:
    Hi,



    I am running a simple invert index generating program in hadoop which will
    emit every word in a text file as well as it’s offsets.

    So the output key is Text and output value is a list of LongWritable.



    What I am trying to do is sort the offsets in reduce function. For each key,
    I put every value into a List and sort using Collections.sort().



    This is the code sanp:

    offsetList.clear();

    for (LongWritable val : values)

    {

    offsetList.add(val);

    }

    Collections.sort(offsetList);





    for (LongWritable offset : offsetList)

    {

    ……

    }



    But it doesn’t work. Looks like all the elements in offsetList have been
    overwritten by the smallest value in values. offsetList and values have the
    same size.

    Can I sort the data in this way?



    Thanks.


    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJan 30, '11 at 10:06a
activeJan 30, '11 at 10:43a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Harsh J: 1 post Exception: 1 post

People

Translate

site design / logo © 2022 Grokbase