Hi,
I am running a simple invert index generating program in hadoop which will emit every word in a text file as well as it's offsets.
So the output key is Text and output value is a list of LongWritable.
What I am trying to do is sort the offsets in reduce function. For each key, I put every value into a List and sort using Collections.sort().
This is the code sanp:
offsetList.clear();
for (LongWritable val : values)
{
offsetList.add(val);
}
Collections.sort(offsetList);
for (LongWritable offset : offsetList)
{
......
}
But it doesn't work. Looks like all the elements in offsetList have been overwritten by the smallest value in values. offsetList and values have the same size.
Can I sort the data in this way?
Thanks.