|| at Feb 22, 2008 at 1:47 am
It may be sorted within the output for a single reducer and, indeed, you can
even guarantee that it is sorted but *only* by the reduce key. The order
that values appear will not be deterministic.
To sort by value, you need to run another MR job with the count from the
first step as the key and the old reducers output key as the value. You
will only need an identity mapper. If you use both the count and the key as
the new key and have an empty value, then you can do a two level sort in one
Hadoop isn't magic. If you want something sorted according to a new
ordering *something* will have to do the work.
On 2/21/08 5:38 PM, "Tarandeep Singh" wrote:
On Thu, Feb 21, 2008 at 5:34 PM, Ted Dunning wrote:
Use another job step to get the sort done.
but isn't the output of reduce step sorted ?
Also can I specify that sort be done in reverse order ?
On 2/21/08 5:11 PM, "Tarandeep Singh" wrote:
On Thu, Feb 21, 2008 at 3:46 PM, Tarandeep Singh <email@example.com>
Can I sort the output of reducer based on the value instead of key.
Also can I specify that the output should be sorted in decreasing order ?
Mapper output -
and outputs -
e.g abc 10
I want the output to be sorted based on the value and that too in
decreasing order -
Any suggestions ?
I set the output format to Text and then converted the count into text
and wrote this as key and the aWord as value. I was expecting an
output sorted on the count now but it didn't work that way ? Could
anyone explain why so ?
reducer output -