Grokbase Groups HBase user June 2010
FAQ
Hi,

I have a hbase hadoop cluster setup. 6 days back we did a cold restart of
our system.
I recently noticed that a hbase query was timing out with

org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region


I looked at the master logs and none of the region servers had connected

2010-06-04 00:00:21,510 INFO org.apache.hadoop.hbase.master.ServerManager: 0
region servers, 0 dead, average load NaN


The master had a stderr output when it started

java.io.EOFException
....
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
complete write to file /hbase/devLogsTable/1225469767/oldlogfile.log by
DFSClient_-107490689

The regionservers have been trying to connect with the master ever since
with the error

2010-06-03 14:33:28,960 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
master. Retrying. Error was: java.net.ConnectException: Connection refused


All the region servers and master processes are running now. Except none of
the region servers are connected.


My first question is how to monitor this problem. None of the logs report an
error. I monitor processes so they are all fine. The logs don't report any
error.
How do i check for the general health of the cluster?


My second question is why did this happen?

thanks
ishwar

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 11, '10 at 7:48p
activeJun 16, '10 at 7:55p
posts3
users2
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase