Grokbase Groups HBase user June 2010
FAQ
Hi Jean,

It happened again today during a server restart. This involved a hadoop
start following by a hbase start.
There was also an exception when hbase master came up on reading a file
from hadoop. Not sure if that is the problem.
Pasted those logs too.


Current state of the system: master, zookeeper, region servers are all up.
But region servers are not connected to master.

Here are the logs ....


1. logs on hbase master and hadoop namenode.
hbase-master.out :http://pastebin.com/6a88nRh5
hadoop-namemode: http://pastebin.com/wHP5uQBh

2. syslog on hbase master.
http://pastebin.com/S9KVVsSf

3. syslog on hbase regionservers. Posted one the other is the same.
http://pastebin.com/kR42Xt2t


I did a netstat -tna to confirm that master is listening on port
127.0.0.121:60000

I did a restart of regionservers only and its able to connect fine.


thanks
ishwar

On Fri, Jun 11, 2010 at 12:56 PM, Jean-Daniel Cryans wrote:

You can check the general health by using the webui, it runs on the
master node at port 60010.

For the errors, the context you gave is so limited that giving any
meaningful answer is impossible. Please post full logs on a web server
or on pastebin.com (or your preferred code pasting site) if it fits.

J-D
On Fri, Jun 11, 2010 at 12:48 PM, ishwar ramani wrote:
Hi,

I have a hbase hadoop cluster setup. 6 days back we did a cold restart of
our system.
I recently noticed that a hbase query was timing out with

org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region


I looked at the master logs and none of the region servers had connected

2010-06-04 00:00:21,510 INFO
org.apache.hadoop.hbase.master.ServerManager: 0
region servers, 0 dead, average load NaN


The master had a stderr output when it started

java.io.EOFException
....
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
complete write to file /hbase/devLogsTable/1225469767/oldlogfile.log by
DFSClient_-107490689

The regionservers have been trying to connect with the master ever since
with the error

2010-06-03 14:33:28,960 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
master. Retrying. Error was: java.net.ConnectException: Connection refused

All the region servers and master processes are running now. Except none of
the region servers are connected.


My first question is how to monitor this problem. None of the logs report an
error. I monitor processes so they are all fine. The logs don't report any
error.
How do i check for the general health of the cluster?


My second question is why did this happen?

thanks
ishwar

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 3 | next ›
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 11, '10 at 7:48p
activeJun 16, '10 at 7:55p
posts3
users2
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase