Grokbase Groups HBase dev July 2009
FAQ
Map tasks not local to RS
-------------------------

Key: HBASE-1672
URL: https://issues.apache.org/jira/browse/HBASE-1672
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.19.3
Environment: DN, TT and RS running on the same nodes.
Reporter: Amandeep Khurana


The number of data local map tasks is only about 10% of the total map tasks...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Amandeep Khurana (JIRA) at Jul 19, 2009 at 6:28 am
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amandeep Khurana updated HBASE-1672:
    ------------------------------------

    Component/s: regionserver
    master
    mapred
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana

    The number of data local map tasks is only about 10% of the total map tasks...
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amandeep Khurana (JIRA) at Jul 19, 2009 at 6:56 am
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Amandeep Khurana updated HBASE-1672:
    ------------------------------------

    Description:
    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.

    was:The number of data local map tasks is only about 10% of the total map tasks...

    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana

    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Jul 20, 2009 at 4:08 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Jonathan Gray updated HBASE-1672:
    ---------------------------------

    Affects Version/s: 0.20.0
    Fix Version/s: 0.19.4
    0.20.0

    Bringing in to 0.20.0 so someone can verify whether this works in trunk or not. I can do it later this week if no one else does.
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Jul 20, 2009 at 4:08 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733256#action_12733256 ]

    Jonathan Gray commented on HBASE-1672:
    --------------------------------------

    This needs to be tested on trunk, thought we had fixed this.
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jean-Daniel Cryans (JIRA) at Jul 21, 2009 at 2:56 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733651#action_12733651 ]

    Jean-Daniel Cryans commented on HBASE-1672:
    -------------------------------------------

    We already do this inside TableInputFormatBase:

    {code}
    String regionLocation = table.getRegionLocation(startKeys[startPos]).
    getServerAddress().getHostname();
    splits[i] = new TableSplit(this.table.getTableName(),
    startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
    HConstants.EMPTY_START_ROW, regionLocation);
    LOG.info("split: " + i + "->" + splits[i]);
    {code}

    I don't know if we can do anything more than that. One difference in HBase compared to mapred on HDFS is that a region is only on one node, not 3 which is the default replication factor. So being able to get the right map task on the right RS at the right moment may be difficult for the JobTracker.
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 21, 2009 at 6:35 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733774#action_12733774 ]

    stack commented on HBASE-1672:
    ------------------------------

    So, what is the indicator in the MR UI measuring? TT+DN locality? Or is it TT+RS? If the latter, and we are only 10% of the time doing TT mapper local to the region hosting server, then our TT+RS locality would seem to be broke -- or ineffective (either would be good to know).
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Jul 24, 2009 at 10:06 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    stack resolved HBASE-1672.
    --------------------------

    Resolution: Cannot Reproduce

    I ran a rowcounter job against a 100 region table of ~20M rows. Cluster was small (4 regionservers). Tasktrackers ran beside the RS. Every task was scheduled on the TT that was local to the RS ("Input Split Locations" always had same value as "Machine" in the taskdetails page).
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Jonathan Gray (JIRA) at Jul 24, 2009 at 11:32 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735207#action_12735207 ]

    Jonathan Gray commented on HBASE-1672:
    --------------------------------------

    Thank you for researching, stack.

    Next week we'll have a ton of MR running on trunk so will report if we find anything strange.
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Amandeep Khurana (JIRA) at Jul 25, 2009 at 5:21 pm
    [ https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735302#action_12735302 ]

    Amandeep Khurana commented on HBASE-1672:
    -----------------------------------------

    I had this issue in 0.19. Not facing the problem in 0.20 though.
    Map tasks not local to RS
    -------------------------

    Key: HBASE-1672
    URL: https://issues.apache.org/jira/browse/HBASE-1672
    Project: Hadoop HBase
    Issue Type: Bug
    Components: mapred, master, regionserver
    Affects Versions: 0.20.0, 0.19.3
    Environment: DN, TT and RS running on the same nodes.
    Reporter: Amandeep Khurana
    Fix For: 0.20.0, 0.19.4


    The number of data local map tasks while scanning a table is only about 10% of the total map tasks...
    My table had 280 regions and 13M records... The number of map tasks in the scan job were equal to the number of regions (280). Only 25 of them were data local tasks.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 19, '09 at 6:26a
activeJul 25, '09 at 5:21p
posts10
users1
websitehbase.apache.org

1 user in discussion

Amandeep Khurana (JIRA): 10 posts

People

Translate

site design / logo © 2022 Grokbase