[
https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733651#action_12733651 ]
Jean-Daniel Cryans commented on HBASE-1672:
-------------------------------------------
We already do this inside TableInputFormatBase:
{code}
String regionLocation = table.getRegionLocation(startKeys[startPos]).
getServerAddress().getHostname();
splits[i] = new TableSplit(this.table.getTableName(),
startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
HConstants.EMPTY_START_ROW, regionLocation);
LOG.info("split: " + i + "->" + splits[i]);
{code}
I don't know if we can do anything more than that. One difference in HBase compared to mapred on HDFS is that a region is only on one node, not 3 which is the default replication factor. So being able to get the right map task on the right RS at the right moment may be difficult for the JobTracker.
Map tasks not local to RS
-------------------------
Key: HBASE-1672
URL:
https://issues.apache.org/jira/browse/HBASE-1672Project: Hadoop HBase
Issue Type: Bug
Components: mapred, master, regionserver
Affects Versions: 0.20.0, 0.19.3
Environment: DN, TT and RS running on the same nodes.
Reporter: Amandeep Khurana
Fix For: 0.20.0, 0.19.4
The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.