Thanks,
-Jack
On Mon, Oct 31, 2011 at 9:49 AM, stack (Updated) (JIRA) wrote:
  [ https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-4695:
-------------------------
  Fix Version/s: 0.92.0
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
  [ https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-4695:
-------------------------
  Fix Version/s: 0.92.0
WAL logs get deleted before region server can fully flush
---------------------------------------------------------
        Key: HBASE-4695
        URL: https://issues.apache.org/jira/browse/HBASE-4695
      Project: HBase
     Issue Type: Bug
     Components: wal
  Affects Versions: 0.90.4
      Reporter: jack levin
      Assignee: gaojinchao
      Priority: Blocker
      Fix For: 0.92.0, 0.90.5
    Attachments: HBASE-4695_Trunk_V2.patch, HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt
To replicate the problem do the following:
1. check /hbase/.logs/XXXX directory to see if you have WAL logs for the region server you are shutting down.
2. executing kill <pid> (where pid is a regionserver pid)
3. Watch the regionserver log to start flushing, you will see how many regions are left to flush:
09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 489 regions to close
09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 116 regions to close
4. Check /hbase/.logs/XXXX -- you will notice that it has dissapeared.
5. Check namenode logs:
09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root ip=/10.101.1.5 cmd=delete src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
Note that, if you kill -9 the RS now, and it crashes on flush, you won't have any WAL logs to replay.  We need to make sure that logs are deleted or moved out only when RS has fully flushed. Otherwise its possible to lose data.
-----------------------------------------------------------
        Key: HBASE-4695
        URL: https://issues.apache.org/jira/browse/HBASE-4695
      Project: HBase
     Issue Type: Bug
     Components: wal
  Affects Versions: 0.90.4
      Reporter: jack levin
      Assignee: gaojinchao
      Priority: Blocker
      Fix For: 0.92.0, 0.90.5
    Attachments: HBASE-4695_Trunk_V2.patch, HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt
To replicate the problem do the following:
1. check /hbase/.logs/XXXX directory to see if you have WAL logs for the region server you are shutting down.
2. executing kill <pid> (where pid is a regionserver pid)
3. Watch the regionserver log to start flushing, you will see how many regions are left to flush:
09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 489 regions to close
09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 116 regions to close
4. Check /hbase/.logs/XXXX -- you will notice that it has dissapeared.
5. Check namenode logs:
09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root ip=/10.101.1.5 cmd=delete src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
Note that, if you kill -9 the RS now, and it crashes on flush, you won't have any WAL logs to replay.  We need to make sure that logs are deleted or moved out only when RS has fully flushed. Otherwise its possible to lose data.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
