Grokbase Groups HBase user March 2011
FAQ
I do some performance test for hbase version 0.90.1
when the name node crashed, I find some data lost.
I'm not sure exactly what arose it. It seems like split logs failed.
I think the master should shutdown itself when HDFS crashed.


The logs is :
2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
at org.apache.hadoop.ipc.Client.call(Client.java:820)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $Proxy5.getListing(Unknown Source)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
at org.apache.hadoop.ipc.Client.call(Client.java:788)
... 13 more
2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
2011-03-22 13:22:05,060 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
at org.apache.hadoop.ipc.Client.call(Client.java:820)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $Proxy5.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
at org.apache.hadoop.ipc.Client.call(Client.java:788)
... 18 more
2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
at org.apache.hadoop.ipc.Client.call(Client.java:820)
at org.apache.hadoop.ipc.RPC$Invok

Search Discussions

  • Jean-Daniel Cryans at Mar 29, 2011 at 5:38 pm
    I was expecting it would die, strange it didn't. Could you provide a
    bigger log, this one basically tells us the NN is gone but that's
    about it. Please put it on a web server or something else that's
    easily reachable for anyone (eg don't post the full thing here).

    Thx,


    J-D
    On Tue, Mar 29, 2011 at 4:28 AM, Gaojinchao wrote:
    I do some performance test for hbase version 0.90.1
    when the name node crashed, I find some data lost.
    I'm not sure exactly what arose it.  It seems like split logs failed.
    I think the master should shutdown itself when HDFS crashed.


    The logs is :
    2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at $Proxy5.getListing(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy5.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
    at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
    at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
    at org.apache.hadoop.ipc.Client.call(Client.java:788)
    ... 13 more
    2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
    2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
    2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
    2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
    2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
    2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
    2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
    2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
    2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
    2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
    2011-03-22 13:22:05,060 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at $Proxy5.getFileInfo(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy5.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
    at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
    at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
    at org.apache.hadoop.ipc.Client.call(Client.java:788)
    ... 18 more
    2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
    2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
    2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
    2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
    2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
    2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
    2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
    2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
    2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
    2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
    2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invok
  • Jean-Daniel Cryans at Mar 31, 2011 at 5:20 pm
    (sending back to the list, please don't answer to directly to the
    sender, always send back to the mailing list)

    MasterFileSystem has most of DFS interactions, it seems that
    checkFileSystem is never called (it should be) and splitLog catches
    the ERROR when splitting but doesn't abort.

    Would you mind opening a jira about this issue and perhaps submit a patch?

    Thx,

    J-D
    On Thu, Mar 31, 2011 at 5:40 AM, Gaojinchao wrote:
    Thanks, I will try to do it again because last one info log level don't turn on.
    I have a question.
    Which code is the Master kill itself when it find namenode crashed?

    if (isCarryingRoot()) { // -ROOT-
    try {
    this.services.getAssignmentManager().assignRoot();
    } catch (KeeperException e) {
    this.server.abort("In server shutdown processing, assigning root", e);
    throw new IOException("Aborting", e);
    }
    }

    -----邮件原件-----
    发件人: jdcryans@gmail.com 代表 Jean-Daniel Cryans
    发送时间: 2011年3月30日 1:39
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: A lot of data is lost when name node crashed

    I was expecting it would die, strange it didn't. Could you provide a
    bigger log, this one basically tells us the NN is gone but that's
    about it. Please put it on a web server or something else that's
    easily reachable for anyone (eg don't post the full thing here).

    Thx,


    J-D
    On Tue, Mar 29, 2011 at 4:28 AM, Gaojinchao wrote:
    I do some performance test for hbase version 0.90.1
    when the name node crashed, I find some data lost.
    I'm not sure exactly what arose it. It seems like split logs failed.
    I think the master should shutdown itself when HDFS crashed.


    The logs is :
    2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at $Proxy5.getListing(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy5.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
    at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
    at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
    at org.apache.hadoop.ipc.Client.call(Client.java:788)
    ... 13 more
    2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
    2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
    2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
    2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
    2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
    2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
    2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
    2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
    2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
    2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
    2011-03-22 13:22:05,060 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at $Proxy5.getFileInfo(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy5.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
    at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
    at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
    at org.apache.hadoop.ipc.Client.call(Client.java:788)
    ... 18 more
    2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
    2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
    2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
    2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
    2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
    2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
    2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
    2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
    2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
    2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
    2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invok
  • Gaojinchao at Apr 1, 2011 at 1:15 am
    Thanks, please submit a patch and I can try to test it.
    Jira is :
    https://issues.apache.org/jira/browse/HBASE-3722

    -----邮件原件-----
    发件人: jdcryans@gmail.com 代表 Jean-Daniel Cryans
    发送时间: 2011年4月1日 1:20
    收件人: Gaojinchao; user@hbase.apache.org
    主题: Re: A lot of data is lost when name node crashed

    (sending back to the list, please don't answer to directly to the
    sender, always send back to the mailing list)

    MasterFileSystem has most of DFS interactions, it seems that
    checkFileSystem is never called (it should be) and splitLog catches
    the ERROR when splitting but doesn't abort.

    Would you mind opening a jira about this issue and perhaps submit a patch?

    Thx,

    J-D
    On Thu, Mar 31, 2011 at 5:40 AM, Gaojinchao wrote:
    Thanks, I will try to do it again because last one info log level don't turn on.
    I have a question.
    Which code is the Master kill itself when it find namenode crashed?

    if (isCarryingRoot()) { // -ROOT-
    try {
    this.services.getAssignmentManager().assignRoot();
    } catch (KeeperException e) {
    this.server.abort("In server shutdown processing, assigning root", e);
    throw new IOException("Aborting", e);
    }
    }

    -----邮件原件-----
    发件人: jdcryans@gmail.com 代表 Jean-Daniel Cryans
    发送时间: 2011年3月30日 1:39
    收件人: user@hbase.apache.org
    抄送: Gaojinchao; Chenjian
    主题: Re: A lot of data is lost when name node crashed

    I was expecting it would die, strange it didn't. Could you provide a
    bigger log, this one basically tells us the NN is gone but that's
    about it. Please put it on a web server or something else that's
    easily reachable for anyone (eg don't post the full thing here).

    Thx,


    J-D
    On Tue, Mar 29, 2011 at 4:28 AM, Gaojinchao wrote:
    I do some performance test for hbase version 0.90.1
    when the name node crashed, I find some data lost.
    I'm not sure exactly what arose it. It seems like split logs failed.
    I think the master should shutdown itself when HDFS crashed.


    The logs is :
    2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at $Proxy5.getListing(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy5.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
    at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
    at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
    at org.apache.hadoop.ipc.Client.call(Client.java:788)
    ... 13 more
    2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
    2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
    2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
    2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
    2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
    2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
    2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
    2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
    2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
    2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
    2011-03-22 13:22:05,060 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
    at $Proxy5.getFileInfo(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy5.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
    at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
    at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
    at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
    at org.apache.hadoop.ipc.Client.call(Client.java:788)
    ... 18 more
    2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
    2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
    2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
    2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
    2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
    2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
    2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
    2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
    2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
    2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
    2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
    java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
    at org.apache.hadoop.ipc.Client.call(Client.java:820)
    at org.apache.hadoop.ipc.RPC$Invok

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMar 29, '11 at 11:29a
activeApr 1, '11 at 1:15a
posts4
users2
websitehbase.apache.org

2 users in discussion

Gaojinchao: 2 posts Jean-Daniel Cryans: 2 posts

People

Translate

site design / logo © 2022 Grokbase