FAQ
Hi-



How does the NameNode handle load balancing of non-local reads with multiple
block locations when locality is equal?



IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
same block, does the NameNode consider current client count or any other
load indicators when deciding which DataNode will satisfy the read request?
Or, is the client provided a list of all split locations and is allowed to
make this choice themselves?



Thanks!



-Ben

Search Discussions

  • Suresh Srinivas at Jan 5, 2012 at 10:33 pm
    Currently it sorts the block locations as:
    # local node
    # local rack node
    # random order of remote nodes

    See DatanodeManager#sortLocatedBlock(...) and
    NetworkTopology#pseudoSortByDistance(...).

    You can play around with other policies by plugging in different
    NetworkTopology.
    On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay wrote:

    Hi-****

    ** **

    How does the NameNode handle load balancing of non-local reads with
    multiple block locations when locality is equal?****

    ** **

    IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
    same block, does the NameNode consider current client count or any other
    load indicators when deciding which DataNode will satisfy the read
    request? Or, is the client provided a list of all split locations and is
    allowed to make this choice themselves?****

    ** **

    Thanks!****

    ** **

    -Ben****

    ** **
  • Ben Clay at Jan 6, 2012 at 2:50 am
    Suresh-

    Thanks for the tips, I'll check those functions out, and examine plugging in
    a different NetworkTopology.

    So to clarify, under the current scheme, if we have 1 block on two local
    rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
    serving 20 clients and DataNode B is serving 1 client, they both have a 50%
    chance of being selected for the 21st client?

    -Ben



    From: Suresh Srinivas
    Sent: Thursday, January 05, 2012 5:33 PM
    To: hdfs-user@hadoop.apache.org
    Subject: Re: HDFS load balancing for non-local reads



    Currently it sorts the block locations as:
    # local node
    # local rack node
    # random order of remote nodes

    See DatanodeManager#sortLocatedBlock(...) and
    NetworkTopology#pseudoSortByDistance(...).

    You can play around with other policies by plugging in different
    NetworkTopology.

    On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay wrote:

    Hi-



    How does the NameNode handle load balancing of non-local reads with multiple
    block locations when locality is equal?



    IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
    same block, does the NameNode consider current client count or any other
    load indicators when deciding which DataNode will satisfy the read request?
    Or, is the client provided a list of all split locations and is allowed to
    make this choice themselves?



    Thanks!



    -Ben
  • Alo.alt at Jan 6, 2012 at 5:45 pm
    Ben,

    the scenario should not happen, if one DN has 20 clients and the other zero (same block) the cluster (or DN) has another problem. Rack Awareness is described here:
    https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf

    - Alex

    --
    Alexander Lorenz
    http://mapredit.blogspot.com
    On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:

    Suresh-
    Thanks for the tips, I’ll check those functions out, and examine plugging in a different NetworkTopology.
    So to clarify, under the current scheme, if we have 1 block on two local rack nodes A and B, it randomly chooses between those? IE, if DataNode A is serving 20 clients and DataNode B is serving 1 client, they both have a 50% chance of being selected for the 21st client?
    -Ben

    From: Suresh Srinivas
    Sent: Thursday, January 05, 2012 5:33 PM
    To: hdfs-user@hadoop.apache.org
    Subject: Re: HDFS load balancing for non-local reads

    Currently it sorts the block locations as:
    # local node
    # local rack node
    # random order of remote nodes

    See DatanodeManager#sortLocatedBlock(...) and NetworkTopology#pseudoSortByDistance(...).

    You can play around with other policies by plugging in different NetworkTopology.

    On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay wrote:
    Hi-

    How does the NameNode handle load balancing of non-local reads with multiple block locations when locality is equal?

    IE, if the client is equidistant (same rack) from 2 DataNodes hosting the same block, does the NameNode consider current client count or any other load indicators when deciding which DataNode will satisfy the read request? Or, is the client provided a list of all split locations and is allowed to make this choice themselves?

    Thanks!

    -Ben
  • Ben Clay at Jan 6, 2012 at 7:56 pm
    Alex-

    Understood. We do not have a situation that extreme, I was just looking for
    conceptual verification that reads are balanced across replicas of equal
    distance. From the PDF you linked:

    "For reading, the name node first checks if the client's computer is located
    in the cluster. If yes, block locations are returned to the client in the
    order of its closeness to the reader. The block is read from data nodes in
    this preference order."

    If two datanodes have equal closeness, I'd like to know how the NameNode
    chooses between them.

    -Ben

    -----Original Message-----
    From: alo.alt
    Sent: Friday, January 06, 2012 12:45 PM
    To: hdfs-user@hadoop.apache.org
    Subject: Re: HDFS load balancing for non-local reads

    Ben,

    the scenario should not happen, if one DN has 20 clients and the other zero
    (same block) the cluster (or DN) has another problem. Rack Awareness is
    described here:
    https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
    oposal.pdf

    - Alex

    --
    Alexander Lorenz
    http://mapredit.blogspot.com
    On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:

    Suresh-
    Thanks for the tips, I'll check those functions out, and examine plugging
    in a different NetworkTopology.
    So to clarify, under the current scheme, if we have 1 block on two local
    rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
    serving 20 clients and DataNode B is serving 1 client, they both have a 50%
    chance of being selected for the 21st client?
    -Ben

    From: Suresh Srinivas
    Sent: Thursday, January 05, 2012 5:33 PM
    To: hdfs-user@hadoop.apache.org
    Subject: Re: HDFS load balancing for non-local reads

    Currently it sorts the block locations as:
    # local node
    # local rack node
    # random order of remote nodes

    See DatanodeManager#sortLocatedBlock(...) and
    NetworkTopology#pseudoSortByDistance(...).
    You can play around with other policies by plugging in different
    NetworkTopology.
    On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay wrote:
    Hi-

    How does the NameNode handle load balancing of non-local reads with
    multiple block locations when locality is equal?
    IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
    same block, does the NameNode consider current client count or any other
    load indicators when deciding which DataNode will satisfy the read request?
    Or, is the client provided a list of all split locations and is allowed to
    make this choice themselves?
    Thanks!

    -Ben
  • Alo.alt at Jan 6, 2012 at 11:42 pm
    Ben,

    thats defined in ReplicationTargetChooser, first local, 2nd same rack, random. You're right - 50/50 if case one and two does not match.

    - Alex

    --
    Alexander Lorenz
    http://mapredit.blogspot.com
    On Jan 6, 2012, at 11:56 AM, Ben Clay wrote:

    Alex-

    Understood. We do not have a situation that extreme, I was just looking for
    conceptual verification that reads are balanced across replicas of equal
    distance. From the PDF you linked:

    "For reading, the name node first checks if the client's computer is located
    in the cluster. If yes, block locations are returned to the client in the
    order of its closeness to the reader. The block is read from data nodes in
    this preference order."

    If two datanodes have equal closeness, I'd like to know how the NameNode
    chooses between them.

    -Ben

    -----Original Message-----
    From: alo.alt
    Sent: Friday, January 06, 2012 12:45 PM
    To: hdfs-user@hadoop.apache.org
    Subject: Re: HDFS load balancing for non-local reads

    Ben,

    the scenario should not happen, if one DN has 20 clients and the other zero
    (same block) the cluster (or DN) has another problem. Rack Awareness is
    described here:
    https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
    oposal.pdf

    - Alex

    --
    Alexander Lorenz
    http://mapredit.blogspot.com
    On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:

    Suresh-
    Thanks for the tips, I'll check those functions out, and examine plugging
    in a different NetworkTopology.
    So to clarify, under the current scheme, if we have 1 block on two local
    rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
    serving 20 clients and DataNode B is serving 1 client, they both have a 50%
    chance of being selected for the 21st client?
    -Ben

    From: Suresh Srinivas
    Sent: Thursday, January 05, 2012 5:33 PM
    To: hdfs-user@hadoop.apache.org
    Subject: Re: HDFS load balancing for non-local reads

    Currently it sorts the block locations as:
    # local node
    # local rack node
    # random order of remote nodes

    See DatanodeManager#sortLocatedBlock(...) and
    NetworkTopology#pseudoSortByDistance(...).
    You can play around with other policies by plugging in different
    NetworkTopology.
    On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay wrote:
    Hi-

    How does the NameNode handle load balancing of non-local reads with
    multiple block locations when locality is equal?
    IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
    same block, does the NameNode consider current client count or any other
    load indicators when deciding which DataNode will satisfy the read request?
    Or, is the client provided a list of all split locations and is allowed to
    make this choice themselves?
    Thanks!

    -Ben

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedJan 5, '12 at 9:40p
activeJan 6, '12 at 11:42p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase