FAQ
hi,

I run almost 60 ruduce tasks for a single job.

if the outputs of a job are from part00 to part 59.

is there way to write rows sequentially by sorted keys?

curretly my outputs are like this.

part00)
1
10
12
14

part 01)
2
4
6
11
13

part 02)
3
5
7
8
9

but, my aim is to get the following results.

part00)
1
2
3
4
5

part01)
6
7
8
9
10

part02)
11
12
13
14
15

the hadoop is able to support this kind of one?

thanks

Search Discussions

  • Shi Yu at Mar 22, 2011 at 3:52 pm
    I guess you need to define a Partitioner to send hased keys to different
    reducers (sorry, I am still using the old API so probably there is
    something new in the trunk release). Basically you try to segment the
    keys into different zones, 0-10, 11-20, ...

    maybe check the hashCode() function and see how to categorize these zones?

    Shi

    On 3/22/2011 9:24 AM, JunYoung Kim wrote:
    hi,

    I run almost 60 ruduce tasks for a single job.

    if the outputs of a job are from part00 to part 59.

    is there way to write rows sequentially by sorted keys?

    curretly my outputs are like this.

    part00)
    1
    10
    12
    14

    part 01)
    2
    4
    6
    11
    13

    part 02)
    3
    5
    7
    8
    9

    but, my aim is to get the following results.

    part00)
    1
    2
    3
    4
    5

    part01)
    6
    7
    8
    9
    10

    part02)
    11
    12
    13
    14
    15

    the hadoop is able to support this kind of one?

    thanks
  • Luca Pireddu at Mar 22, 2011 at 4:03 pm

    On March 22, 2011 16:54:34 Shi Yu wrote:
    I guess you need to define a Partitioner to send hased keys to different
    reducers (sorry, I am still using the old API so probably there is
    something new in the trunk release). Basically you try to segment the
    keys into different zones, 0-10, 11-20, ...

    maybe check the hashCode() function and see how to categorize these zones?

    Shi
    On 3/22/2011 9:24 AM, JunYoung Kim wrote:
    hi,

    I run almost 60 ruduce tasks for a single job.

    if the outputs of a job are from part00 to part 59.

    is there way to write rows sequentially by sorted keys?

    curretly my outputs are like this.

    part00)
    1
    10
    12
    14

    part 01)
    2
    4
    6
    11
    13

    part 02)
    3
    5
    7
    8
    9

    but, my aim is to get the following results.

    part00)
    1
    2
    3
    4
    5

    part01)
    6
    7
    8
    9
    10

    part02)
    11
    12
    13
    14
    15

    the hadoop is able to support this kind of one?

    thanks

    You can look at TeraSort in the examples to see how to do this. There's even
    a short write-up by Owen O'Malley about it here:
    http://sortbenchmark.org/YahooHadoop.pdf



    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Icebergs at Mar 25, 2011 at 11:24 am
    You should define your own partitioner.

    2011/3/23 Luca Pireddu <pireddu@crs4.it>
    On March 22, 2011 16:54:34 Shi Yu wrote:
    I guess you need to define a Partitioner to send hased keys to different
    reducers (sorry, I am still using the old API so probably there is
    something new in the trunk release). Basically you try to segment the
    keys into different zones, 0-10, 11-20, ...

    maybe check the hashCode() function and see how to categorize these zones?
    Shi
    On 3/22/2011 9:24 AM, JunYoung Kim wrote:
    hi,

    I run almost 60 ruduce tasks for a single job.

    if the outputs of a job are from part00 to part 59.

    is there way to write rows sequentially by sorted keys?

    curretly my outputs are like this.

    part00)
    1
    10
    12
    14

    part 01)
    2
    4
    6
    11
    13

    part 02)
    3
    5
    7
    8
    9

    but, my aim is to get the following results.

    part00)
    1
    2
    3
    4
    5

    part01)
    6
    7
    8
    9
    10

    part02)
    11
    12
    13
    14
    15

    the hadoop is able to support this kind of one?

    thanks

    You can look at TeraSort in the examples to see how to do this. There's
    even
    a short write-up by Owen O'Malley about it here:
    http://sortbenchmark.org/YahooHadoop.pdf



    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Harsh J at Mar 22, 2011 at 3:54 pm
    You are looking for total order partitioning of your map-emitted data.
    Have a look at the TotalOrderPartitioner class/usage.
    On Tue, Mar 22, 2011 at 7:54 PM, JunYoung Kim wrote:
    hi,

    I run almost 60 ruduce tasks for a single job.

    if the outputs of a job are from part00 to part 59.

    is there way to write rows sequentially by sorted keys?

    curretly my outputs are like this.

    part00)
    1
    10
    12
    14

    part 01)
    2
    4
    6
    11
    13

    part 02)
    3
    5
    7
    8
    9

    but, my aim is to get the following results.

    part00)
    1
    2
    3
    4
    5

    part01)
    6
    7
    8
    9
    10

    part02)
    11
    12
    13
    14
    15

    the hadoop is able to support this kind of one?

    thanks


    --
    Harsh J
    http://harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 22, '11 at 2:24p
activeMar 25, '11 at 11:24a
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase