FAQ
Hello everyone,

I am running a MapReduce job where the the map task executes one GET for each key/value pair it processes.

Although the map tasks which run first complete fast (in 2 minutes for example) then the next map tasks need much more time to complete (4mins) and even later the following map tasks need more that 15 mins to complete.

It seems like HBase overloads and cannot respond fast enough.

While the MR job is running I have noticed the following:

1) The cpu usage of the map tasks is high at the beginning and then goes down to 4-5%. I think that this means that the results of the GET command take long to be returned.

2) The used stack of the RegionServers (as shown in the web GUI) increases and it doesn't decrease even when the job is completed.

3) Using the "top" command, I see that the memory used by the regionserver increases up to the stack limit I have selected (2GB) and it doesn't go down even when the job is completed.

Has anyone noticed something like that???

Can you please help me?

Thank you for your help,
Panagiotis

Search Discussions

  • Stack at Sep 7, 2011 at 3:33 pm

    2011/9/7 Panagiotis Antonopoulos <antonopoulospan@hotmail.com>:
    Although the map tasks which run first complete fast (in 2 minutes for example) then the next map tasks need much more time to complete (4mins) and even later the following map tasks need more that 15 mins to complete.
    Are all maps in flight when some complete in 2 minutes? What is
    happening with i/o as we go from 2-15 minutes? Is it going up as time
    progresses? What about the network? What is the map doing? A get
    only? Or is it also populating the cluster so more data in the
    system when maps are taking longer to complete. Do you have many
    regions? Are they evenly distributed, etc.
    It seems like HBase overloads and cannot respond fast enough.

    While the MR job is running I have noticed the following:

    1) The cpu usage of the map tasks is high at the beginning and then goes down to 4-5%. I think that this means that the results of the GET command take long to be returned.
    This could be. Does iowait go up as job progresses?
    2) The used stack of the RegionServers (as shown in the web GUI) increases and it doesn't decrease even when the job is completed.
    You mean heap used? Yeah, thats general tendency of java apps. There
    is no 'shrink' of the allocated heap when done facility.

    3) Using the "top" command, I see that the memory used by the regionserver increases up to the stack limit I have selected (2GB) and it doesn't go down even when the job is completed.
    See above.
    St.Ack
  • Panagiotis Antonopoulos at Sep 7, 2011 at 5:31 pm
    St.Ack you are always helping us!
    Thank you very much!!!

    The cluster has an NFS where the default directory of all users is saved. (when I log in my working directory is in the NFS)
    I have Hadoop and HBase in the local filesystem of each node. However, is there any possibility that HBase uses the NFS?
    Should I use any other parameters than those below?

    hbase-site.xml is the following:
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://clone11:9000/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
    false: standalone and pseudo-distributed setups with managed Zookeeper
    true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
    </property>

    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>clone11</value>
    <description>A
    </description>
    </property>

    <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
    <description>A
    </description>
    </property>

    <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/local/panton/hadoop-0.20.2-cdh3u0/dfs/zoo</value>
    <description>A
    </description>
    </property>

    You mean heap used?
    Yes you are totally right. I got mixed up.
    Are all maps in flight when some complete in 2 minutes?
    Yes. Always 48 maps are running.
    What is happening with i/o as we go from 2-15 minutes? Is it going up as time progresses? What about the network? Does iowait go up as job progresses?
    How will I monitor the i/o, and the network? Can you please tell me a tool? I am not the admin of the cluster so maybe I will have to ask the admins to instal it.

    i/o wait (using top command) is generally steady and around 0-15% even when the tasks take much longer, going up only for a couple of moments.
    Idle percentage is always really high.
    What is the map doing? A get only? Or is it also populating the cluster so more data in the system when maps are taking longer to complete.
    They do a GET to load a string from HBase and compare it with a string that comes as input.
    When I use an empty table with the GET, The cpu usage is really high, i/o wait is low, the map tasks are completed much faster and the time is steady for all map tasks.

    On the other hand, if I keep the GET with the normal table (which has many rows) and remove all context.write() commands the problem remains. However the problems gets a bit smaller as the first tasks need 2-3 mins and the next need about 6-7 mins.

    This is why I believe it has to do with HBase and the GET. Do you think this is a correct assumption?
    Do you have many regions? Are they evenly distributed, etc.
    Yes I always take care to split the table and have the regions evenly distributed.





    Date: Wed, 7 Sep 2011 08:32:29 -0700
    Subject: Re: HBase slowdown while running MR job with GET
    From: stack@duboce.net
    To: user@hbase.apache.org

    2011/9/7 Panagiotis Antonopoulos <antonopoulospan@hotmail.com>:
    Although the map tasks which run first complete fast (in 2 minutes for example) then the next map tasks need much more time to complete (4mins) and even later the following map tasks need more that 15 mins to complete.
    Are all maps in flight when some complete in 2 minutes? What is
    happening with i/o as we go from 2-15 minutes? Is it going up as time
    progresses? What about the network? What is the map doing? A get
    only? Or is it also populating the cluster so more data in the
    system when maps are taking longer to complete. Do you have many
    regions? Are they evenly distributed, etc.
    It seems like HBase overloads and cannot respond fast enough.

    While the MR job is running I have noticed the following:

    1) The cpu usage of the map tasks is high at the beginning and then goes down to 4-5%. I think that this means that the results of the GET command take long to be returned.
    This could be. Does iowait go up as job progresses?
    2) The used stack of the RegionServers (as shown in the web GUI) increases and it doesn't decrease even when the job is completed.
    You mean heap used? Yeah, thats general tendency of java apps. There
    is no 'shrink' of the allocated heap when done facility.

    3) Using the "top" command, I see that the memory used by the regionserver increases up to the stack limit I have selected (2GB) and it doesn't go down even when the job is completed.
    See above.
    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedSep 7, '11 at 11:46a
activeSep 7, '11 at 5:31p
posts3
users2
websitehbase.apache.org

2 users in discussion

Panagiotis Antonopoulos: 2 posts Stack: 1 post

People

Translate

site design / logo © 2022 Grokbase