FAQ
In my Hadoop cluster, I've had several drives fail lately (and they've
been replaced). Each time a new empty drive is placed in the cluster,
I run the balancer.

I understand that the balancer will redistribute the load of file
blocks across the nodes.

My question is: will balancer also look at the desired replication of
a file, and if the actual replication of a file is less than the
desired (because the file had blocks stored on the lost drive), will
balancer re-replicate those lost blocks?

If not, is there another tool that will ensure the desired replication
factor of files is satisfied?

If this functionality doesn't exist, I'm concerned that I'm slowly,
silently losing my files as I replace drives, and I may not even
realize it.

Thoughts?

Search Discussions

  • Suratna Budalakoti at Jun 24, 2009 at 1:37 am
    Hi all,

    Is there any way to tell, from logs, or by reading/setting a counter, whether a particular mapper was data local, i.e., it ran on the same node as its input data?

    Thanks,
    Suratna
  • Bradford Stephens at Jun 24, 2009 at 1:42 am
    (Correct me if I'm wrong), but I think you can tell though the Hadoop
    Web UI -- it'll show a count of which map tasks are data-local. You
    can then click on that to see a list of all the tasks there, and drill
    down to see which nodes those tasks ran on.

    On Tue, Jun 23, 2009 at 6:37 PM, Suratna
    Budalakotiwrote:
    Hi all,

    Is there any way to tell, from logs, or by reading/setting a counter, whether a particular mapper was data local, i.e., it ran on the same node as its input data?

    Thanks,
    Suratna
  • Jason hadoop at Jun 24, 2009 at 2:42 am
    The namenode is constantly receiving reports about what datanode has what
    blocks, and performing replication when a block becomes under replicated.
    On Tue, Jun 23, 2009 at 6:18 PM, Stuart White wrote:

    In my Hadoop cluster, I've had several drives fail lately (and they've
    been replaced). Each time a new empty drive is placed in the cluster,
    I run the balancer.

    I understand that the balancer will redistribute the load of file
    blocks across the nodes.

    My question is: will balancer also look at the desired replication of
    a file, and if the actual replication of a file is less than the
    desired (because the file had blocks stored on the lost drive), will
    balancer re-replicate those lost blocks?

    If not, is there another tool that will ensure the desired replication
    factor of files is satisfied?

    If this functionality doesn't exist, I'm concerned that I'm slowly,
    silently losing my files as I replace drives, and I may not even
    realize it.

    Thoughts?


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 24, '09 at 1:18a
activeJun 24, '09 at 2:42a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase