FAQ
The input splits are sampled when we use the total order partitioner. I want to
know how and when this sampling is done. Is this sampling done before Master
allocates tasks to the nodes since the sampling file has to be added to
distributed cache as well. If it is so, is this sampling carried out at master
node? Then master has to access the input splits for getting the samples?

Search Discussions

  • Abc xyz at Aug 9, 2010 at 3:30 pm
    1) The input splits are sampled when we use the total order partitioner provided
    in Hadoop 0.19. I want to

    know how and when this sampling is done. Is this sampling done before Master
    allocates tasks to the nodes since the sampling file has to be added to
    distributed cache as well. If it is so, is this sampling carried out at master
    node? Then master has to access the input splits for getting the samples?

    2) Also, does total order partitioner allow such ranges where a key can belong
    to more than one ranges? I mean something like this, A, C, D, D, H, Y where
    keys from A and C sent to one partition, Keys from C to D sent to 2nd
    partition, Keys with value D can be sent randomly either to 2nd or 3rd
    partition, and so on. or are these ranges mutually exclusive?
  • Gang Luo at Aug 9, 2010 at 4:47 pm
    the sampling is done at the master node by accessing the splits before the job
    is submitted. The partitioner, by default, should only sent one key to one
    partition exclusively, unless you modify it.

    -Gang




    ----- 原始邮件 ----
    发件人: abc xyz <fabc_xyz111@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/8/9 (周一) 11:30:11 上午
    主 题: Total order partitioner [Modified]


    1) The input splits are sampled when we use the total order partitioner provided

    in Hadoop 0.19. I want to

    know how and when this sampling is done. Is this sampling done before Master
    allocates tasks to the nodes since the sampling file has to be added to
    distributed cache as well. If it is so, is this sampling carried out at master
    node? Then master has to access the input splits for getting the samples?

    2) Also, does total order partitioner allow such ranges where a key can belong
    to more than one ranges? I mean something like this, A, C, D, D, H, Y where
    keys from A and C sent to one partition, Keys from C to D sent to 2nd
    partition, Keys with value D can be sent randomly either to 2nd or 3rd
    partition, and so on. or are these ranges mutually exclusive?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 9, '10 at 3:06p
activeAug 9, '10 at 4:47p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Abc xyz: 2 posts Gang Luo: 1 post

People

Translate

site design / logo © 2022 Grokbase