-------------------------------------------------------------------------------------
Key: HADOOP-2060
URL: https://issues.apache.org/jira/browse/HADOOP-2060
Project: Hadoop
Issue Type: Bug
Components: dfs
Reporter: Runping Qi
When I chase down the DFSClient code to see how the data locality impact the dfs read throughput,
I realized that DFSClient does not use data locality info (at least not obvious to me)
when it chooses a block for read from the available replicas.
Here is the relevant code:
{code}
/**
* Pick the best node from which to stream the data.
* Entries in <i>nodes</i> are already in the priority order
*/
private DatanodeInfo bestNode(DatanodeInfo nodes[],
AbstractMap<DatanodeInfo, DatanodeInfo> deadNodes)
throws IOException {
if (nodes != null) {
for (int i = 0; i < nodes.length; i++) {
if (!deadNodes.containsKey(nodes[i])) {
return nodes[i];
}
}
}
throw new IOException("No live nodes contain current block");
}
private DNAddrPair chooseDataNode(LocatedBlock block)
throws IOException {
int failures = 0;
while (true) {
DatanodeInfo[] nodes = block.getLocations();
try {
DatanodeInfo chosenNode = bestNode(nodes, deadNodes);
InetSocketAddress targetAddr = DataNode.createSocketAddr(chosenNode.getName());
return new DNAddrPair(chosenNode, targetAddr);
} catch (IOException ie) {
String blockInfo = block.getBlock() + " file=" + src;
if (failures >= MAX_BLOCK_ACQUIRE_FAILURES) {
throw new IOException("Could not obtain block: " + blockInfo);
}
if (nodes == null || nodes.length == 0) {
LOG.info("No node available for block: " + blockInfo);
}
LOG.info("Could not obtain block " + block.getBlock() + " from any node: " + ie);
try {
Thread.sleep(3000);
} catch (InterruptedException iex) {
}
deadNodes.clear(); //2nd option is to remove only nodes[blockId]
openInfo();
failures++;
continue;
}
}
}
{code}
It seems to pick the first good replica.
This means that even though some replica is local to the node where the client runs,
it may actually pick a remote replica.
Map/reduce tries to schedule a mapper to a node where some copy of the input split data is local to the node.
However, if the DFSClient does not use that info in choosing replica for read, the mapper may well have to read data
from the network, even though a local replica is available.
I hope I missed something and misunderstood the code.
Otherwise, this will be a serious problem to performance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.