Grokbase Groups HBase user June 2011
FAQ
Hi everybody,

it was not an easy way to run a map reduce job at all, ie if a third party jars are involved...
a good help is the article by cloudera: http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

i still can not use the -libjars argument for running a MR job with 3rd party jars, like described in the first option
for some reason it does not work for me... the tasks fail with the java.lang.ClassNotFoundException, classes of the 3rd party lib are not found

the second option: Include the referenced JAR in the lib subdirectory of the submittable JAR
actually this works fine for me, starting a job from the shell like this: ./bin/hadoop jar /tmp/my.jar package.HBaseReader
not the most elegant way, but finally it works

now i would like to start MR jobs from my web application running on a tomcat, is there an elegant way to do it using 3rd party jars?

the third option described at the article is to include the jars on every tasktracker, which is IMHO not the very best, like the second...


the second question: at the moment i use the TextOutputFormatis the output format, which creates a file in the specified dfs directory: part-r-00000
so i can read id using ./bin/hadoop fs -cat /tmp/requests/part-r-00000 on the shell

how can i get the path to this output file after my job is finished, to process it however... is there another way to collect results of a MR job, a text file is good for humans, but IMHO parsing a text file for results is not the preferable way...

thanks in advance
andre

PS:
versions:
- Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
- hadoop-0.20.2-CDH3B4
- hbase-0.90.1-CDH3B4
- zookeeper-3.3.2-CDH3B4

Search Discussions

  • Bijieshan at Jun 22, 2011 at 9:33 am
    1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003

    2.Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions:
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)

    3.Region was assigned to this dead RS at "10:50:20,671":
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003

    I'm still working on this issue. I hope anyone can give out any suggestions.

    Thanks!

    Jieshan Bean


    Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":

    10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Async create of unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:39:51,994 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467, server=167-6-1-11,20020,1308105402003
    10:39:52,182 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467
    10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308105402003, region=612342de1fe4733f72299d70addb6d11
    10:43:22,938 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:43:23,157 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308105402003
    10:43:52,300 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 regions(s) that are already in transition)
    10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-12,20020,1308105397174
    10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    10:45:55,081 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-12,20020,1308105397174
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003
    10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. (offlining)
    10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=167-6-1-12,20020,1308105397174, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:50:20,527 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 612342de1fe4733f72299d70addb6d11
    10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=CLOSED, ts=1308106125379
    10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
    10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000, region=612342de1fe4733f72299d70addb6d11
    10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:51:33,245 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:51:33,312 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308106004710
  • Jean-Daniel Cryans at Jun 22, 2011 at 9:04 pm
    As far as I can tell your server is alive because it opened the region
    just fine:

    Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENED,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11

    It has a different start code from when it was assigned (1308106004710
    vs 1308105402003) which is pretty strange, and on top of that the line
    with M_ZK_REGION_OFFLINE shows port 20000 instead of 20020.

    I believe that this is only a small snippet of a bigger log, right? I
    bet there is more we could learn in there, also it'd be nice if you
    told us what was happening on your cluster at the same time,
    understanding the context would help.

    J-D
    On Wed, Jun 22, 2011 at 2:31 AM, bijieshan wrote:
    1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003

    2.Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions:
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)

    3.Region was assigned to this dead RS at "10:50:20,671":
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003

    I'm still working on this issue. I hope anyone can give out any suggestions.

    Thanks!

    Jieshan Bean


    Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":

    10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Async create of unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:39:51,994 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467, server=167-6-1-11,20020,1308105402003
    10:39:52,182 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467
    10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308105402003, region=612342de1fe4733f72299d70addb6d11
    10:43:22,938 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:43:23,157 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308105402003
    10:43:52,300 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 regions(s) that are already in transition)
    10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-12,20020,1308105397174
    10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    10:45:55,081 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-12,20020,1308105397174
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003
    10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. (offlining)
    10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=167-6-1-12,20020,1308105397174, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:50:20,527 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 612342de1fe4733f72299d70addb6d11
    10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=CLOSED, ts=1308106125379
    10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
    10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000, region=612342de1fe4733f72299d70addb6d11
    10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:51:33,245 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:51:33,312 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308106004710
  • Bijieshan at Jun 23, 2011 at 2:02 am
    Yes, it seems pretty strange.
    You're right, there's so many logs, so I just picked off part of the logs.
    I feel a hiding bug here. And I'm still working on it.

    I feel the function of AssignmentManager#regionOnline has a problem:
    1. ServerShutdownHandler will remove the server from the AssignmentManager.servers.
    2. regionOnline will add the server into AssignmentManager.servers again.

    What about a competition happened here?

    public void regionOnline(HRegionInfo regionInfo, HServerInfo serverInfo) {
    synchronized (this.regionsInTransition) {
    ---------------
    }
    synchronized (this.regions) {
    // Add check
    ---------------
    addToServers(hsiWithoutLoad, regionInfo);
    this.regions.notifyAll();
    }
    // Remove plan if one.
    clearRegionPlan(regionInfo);
    // Update timers for all regions in transition going against this server.
    updateTimers(serverInfo);
    }
    Thanks J-D.

    Jieshan Bean



    ------------------------------------------------

    As far as I can tell your server is alive because it opened the region
    just fine:

    Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENED,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11

    It has a different start code from when it was assigned (1308106004710
    vs 1308105402003) which is pretty strange, and on top of that the line
    with M_ZK_REGION_OFFLINE shows port 20000 instead of 20020.

    I believe that this is only a small snippet of a bigger log, right? I
    bet there is more we could learn in there, also it'd be nice if you
    told us what was happening on your cluster at the same time,
    understanding the context would help.

    J-D
    On Wed, Jun 22, 2011 at 2:31 AM, bijieshan wrote:
    1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003

    2.Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions:
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)

    3.Region was assigned to this dead RS at "10:50:20,671":
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003

    I'm still working on this issue. I hope anyone can give out any suggestions.

    Thanks!

    Jieshan Bean


    Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":

    10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Async create of unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:39:51,994 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467, server=167-6-1-11,20020,1308105402003
    10:39:52,182 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467
    10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308105402003, region=612342de1fe4733f72299d70addb6d11
    10:43:22,938 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:43:23,157 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308105402003
    10:43:52,300 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 regions(s) that are already in transition)
    10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-12,20020,1308105397174
    10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    10:45:55,081 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-12,20020,1308105397174
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003
    10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. (offlining)
    10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=167-6-1-12,20020,1308105397174, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:50:20,527 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 612342de1fe4733f72299d70addb6d11
    10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=CLOSED, ts=1308106125379
    10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
    10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000, region=612342de1fe4733f72299d70addb6d11
    10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:51:33,245 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:51:33,312 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308106004710
  • Bijieshan at Jun 24, 2011 at 2:12 am
    The following steps can recreate the problem:
    1. There's thousands of regions in the cluster.
    2. Stop the cluster.
    3. Start the cluster. Killing one regionserver while the regions were opening. Restarted it after 10 seconds.

    The shutted regionserver will appear in the online list again:

    2011-06-23 14:14:30,775 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: 167-6-1-12,20020,1308803390123=2220, 167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2205, 167-6-1-13,20020,1308803514394=2183

    Two regionservers had the same hostname but different startcode:
    167-6-1-13,20020,1308803391742=2374
    167-6-1-13,20020,1308803514394=2183

    Some other related logs:

    2011-06-23 12:31:03,192 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for 167-6-1-13,20020,1308803391742
    2011-06-23 12:31:06,433 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2468 region(s) that 167-6-1-13,20020,1308803391742 was carrying (skipping 0 regions(s) that are already in transition)


    2011-06-23 12:31:54,687 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=167-6-1-13,20020,1308803514394, regionCount=0, userLoad=false

    2011-06-23 12:39:30,773 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: 167-6-1-12,20020,1308803390123=2942, 167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2917, 167-6-1-13,20020,1308803514394=749


    2374
    2011-06-23 12:32:41,315 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 2d655e795ea0044885ad60d2ea057113 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    2011-06-23 12:32:41,315 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ac98d67ec85c370a39fcf8b9b2ce76d1 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    ...
    2011-06-23 12:34:30,613 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 24e833b7503a74c66f1c62c72549cb70 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    2011-06-23 12:34:30,681 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting d6c4392a580527d23018576357abed3e on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    2011-06-23 12:34:30,771 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 832f72e0155c6a664c1a05a2b57e8687 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)



    Jieshan Bean


    --------------------------------------------------

    Yes, it seems pretty strange.
    You're right, there's so many logs, so I just picked off part of the logs.
    I feel a hiding bug here. And I'm still working on it.

    I feel the function of AssignmentManager#regionOnline has a problem:
    1. ServerShutdownHandler will remove the server from the AssignmentManager.servers.
    2. regionOnline will add the server into AssignmentManager.servers again.

    What about a competition happened here?

    public void regionOnline(HRegionInfo regionInfo, HServerInfo serverInfo) {
    synchronized (this.regionsInTransition) {
    ---------------
    }
    synchronized (this.regions) {
    // Add check
    ---------------
    addToServers(hsiWithoutLoad, regionInfo);
    this.regions.notifyAll();
    }
    // Remove plan if one.
    clearRegionPlan(regionInfo);
    // Update timers for all regions in transition going against this server.
    updateTimers(serverInfo);
    }
    Thanks J-D.

    Jieshan Bean



    ------------------------------------------------

    As far as I can tell your server is alive because it opened the region
    just fine:

    Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENED,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11

    It has a different start code from when it was assigned (1308106004710
    vs 1308105402003) which is pretty strange, and on top of that the line
    with M_ZK_REGION_OFFLINE shows port 20000 instead of 20020.

    I believe that this is only a small snippet of a bigger log, right? I
    bet there is more we could learn in there, also it'd be nice if you
    told us what was happening on your cluster at the same time,
    understanding the context would help.

    J-D
    On Wed, Jun 22, 2011 at 2:31 AM, bijieshan wrote:
    1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003

    2.Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions:
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)

    3.Region was assigned to this dead RS at "10:50:20,671":
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003

    I'm still working on this issue. I hope anyone can give out any suggestions.

    Thanks!

    Jieshan Bean


    Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":

    10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Async create of unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:39:51,994 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467, server=167-6-1-11,20020,1308105402003
    10:39:52,182 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467
    10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308105402003, region=612342de1fe4733f72299d70addb6d11
    10:43:22,938 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:43:23,157 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308105402003
    10:43:52,300 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 regions(s) that are already in transition)
    10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-12,20020,1308105397174
    10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    10:45:55,081 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-12,20020,1308105397174
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003
    10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. (offlining)
    10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=167-6-1-12,20020,1308105397174, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:50:20,527 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 612342de1fe4733f72299d70addb6d11
    10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=CLOSED, ts=1308106125379
    10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
    10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000, region=612342de1fe4733f72299d70addb6d11
    10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:51:33,245 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:51:33,312 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308106004710
  • Jean-Daniel Cryans at Jun 24, 2011 at 6:06 pm
    Can you instrument AssignmentManager.addToServers and see if that's
    really re-adding that node?

    Thanks for digging,

    J-D
    On Thu, Jun 23, 2011 at 7:11 PM, bijieshan wrote:
    The following steps can recreate the problem:
    1. There's thousands of regions in the cluster.
    2. Stop the cluster.
    3. Start the cluster. Killing one regionserver while the regions were opening. Restarted it after 10 seconds.

    The shutted regionserver will appear in the online list again:

    2011-06-23 14:14:30,775 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: 167-6-1-12,20020,1308803390123=2220, 167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2205, 167-6-1-13,20020,1308803514394=2183

    Two regionservers had the same hostname but different startcode:
    167-6-1-13,20020,1308803391742=2374
    167-6-1-13,20020,1308803514394=2183

    Some other related logs:

    2011-06-23 12:31:03,192 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for 167-6-1-13,20020,1308803391742
    2011-06-23 12:31:06,433 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2468 region(s) that 167-6-1-13,20020,1308803391742 was carrying (skipping 0 regions(s) that are already in transition)


    2011-06-23 12:31:54,687 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=167-6-1-13,20020,1308803514394, regionCount=0, userLoad=false

    2011-06-23 12:39:30,773 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: 167-6-1-12,20020,1308803390123=2942, 167-6-1-13,20020,1308803391742=2374, 167-6-1-11,20020,1308803386333=2917, 167-6-1-13,20020,1308803514394=749


    2374
    2011-06-23 12:32:41,315 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 2d655e795ea0044885ad60d2ea057113 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    2011-06-23 12:32:41,315 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ac98d67ec85c370a39fcf8b9b2ce76d1 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    ...
    2011-06-23 12:34:30,613 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 24e833b7503a74c66f1c62c72549cb70 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    2011-06-23 12:34:30,681 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting d6c4392a580527d23018576357abed3e on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    2011-06-23 12:34:30,771 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 832f72e0155c6a664c1a05a2b57e8687 on serverName=167-6-1-13,20020,1308803391742, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)



    Jieshan Bean


    --------------------------------------------------

    Yes, it seems pretty strange.
    You're right, there's so many logs, so I just picked off part of the logs.
    I feel a hiding bug here. And I'm still working on it.

    I feel the function of AssignmentManager#regionOnline has a problem:
    1. ServerShutdownHandler will remove the server from the AssignmentManager.servers.
    2. regionOnline will add the server into AssignmentManager.servers again.

    What about a competition happened here?

    public void regionOnline(HRegionInfo regionInfo, HServerInfo serverInfo) {
    synchronized (this.regionsInTransition) {
    ---------------
    }
    synchronized (this.regions) {
    // Add check
    ---------------
    addToServers(hsiWithoutLoad, regionInfo);
    this.regions.notifyAll();
    }
    // Remove plan if one.
    clearRegionPlan(regionInfo);
    // Update timers for all regions in transition going against this server.
    updateTimers(serverInfo);
    }
    Thanks J-D.

    Jieshan Bean



    ------------------------------------------------

    As far as I can tell your server is alive because it opened the region
    just fine:

    Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENING,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11
    Handling transition=RS_ZK_REGION_OPENED,
    server=167-6-1-11,20020,1308106004710,
    region=612342de1fe4733f72299d70addb6d11

    It has a different start code from when it was assigned (1308106004710
    vs 1308105402003) which is pretty strange, and on top of that the line
    with M_ZK_REGION_OFFLINE shows port 20000 instead of 20020.

    I believe that this is only a small snippet of a bigger log, right? I
    bet there is more we could learn in there, also it'd be nice if you
    told us what was happening on your cluster at the same time,
    understanding the context would help.

    J-D
    On Wed, Jun 22, 2011 at 2:31 AM, bijieshan wrote:
    1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003

    2.Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions:
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)

    3.Region was assigned to this dead RS at "10:50:20,671":
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003

    I'm still working on this issue. I hope anyone can give out any suggestions.

    Thanks!

    Jieshan Bean


    Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":

    10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Async create of unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:39:51,994 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467, server=167-6-1-11,20020,1308105402003
    10:39:52,182 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE, ts=1308105591467
    10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308105402003, region=612342de1fe4733f72299d70addb6d11
    10:43:22,938 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:43:23,157 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308105402003
    10:43:52,300 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 regions(s) that are already in transition)
    10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
    10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-12,20020,1308105397174
    10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:45:55,063 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11 on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
    10:45:55,081 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-12,20020,1308105397174
    10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of 167-6-1-11,20020,1308105402003
    10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. (offlining)
    10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=167-6-1-12,20020,1308105397174, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
    10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
    10:50:20,527 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 612342de1fe4733f72299d70addb6d11
    10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=CLOSED, ts=1308106125379
    10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
    10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
    10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000, region=612342de1fe4733f72299d70addb6d11
    10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
    10:51:33,245 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
    10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
    10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000 Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
    10:51:33,312 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308106004710
  • Andre Reiter at Jun 22, 2011 at 9:48 am

    schrieb Andre Reiter:
    Hi everybody,

    it was not an easy way to run a map reduce job at all, ie if a third party jars are involved...
    a good help is the article by cloudera: http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

    i still can not use the -libjars argument for running a MR job with 3rd party jars, like described in the first option
    for some reason it does not work for me... the tasks fail with the java.lang.ClassNotFoundException, classes of the 3rd party lib are not found
    ...
    ups... :-)
    the reasen for this was creating a new config
    Configuration conf = HBaseConfiguration.create();
    instead of using the one provided by the Configured super class
    Configuration conf = getConf();

    now the -libjars works like a charm :-)

    but the other questions are still there...
  • Andre Reiter at Jun 24, 2011 at 4:41 am
    Hi everybody,

    no suggestiona about that questions?
    how to submit a MR out of my application, and not manually from a shell useing ./bin/hadoop jar ... ?

    best regards
    andre



    Andre Reiter wrote:
    now i would like to start MR jobs from my web application running on a tomcat, is there an elegant way to do it?

    the second question: at the moment i use the TextOutputFormatis the output format, which creates a file in the specified dfs directory: part-r-00000
    so i can read id using ./bin/hadoop fs -cat /tmp/requests/part-r-00000 on the shell

    how can i get the path to this output file after my job is finished, to process it however... is there another way to collect results of a MR job, a text file is good for humans, but IMHO parsing a text file for results is not the preferable way...

    thanks in advance
    andre

    PS:
    versions:
    - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
    - hadoop-0.20.2-CDH3B4
    - hbase-0.90.1-CDH3B4
    - zookeeper-3.3.2-CDH3B4
  • Doug Meil at Jun 24, 2011 at 12:29 pm
    Hi there-

    Take a look at this for starters...

    http://hbase.apache.org/book.html#mapreduce


    if you do job.waitForCompletion(true); it will execute synchronously. If you do job.waitForCompletion(false) it will fire and forget. A simple pattern is to spin off a thread where it executes job.waitFor..(true) and then you can pick up the results.


    -----Original Message-----
    From: Andre Reiter
    Sent: Friday, June 24, 2011 12:41 AM
    To: user@hbase.apache.org
    Subject: Re: Running MapReduce from a web application

    Hi everybody,

    no suggestiona about that questions?
    how to submit a MR out of my application, and not manually from a shell useing ./bin/hadoop jar ... ?

    best regards
    andre



    Andre Reiter wrote:
    now i would like to start MR jobs from my web application running on a tomcat, is there an elegant way to do it?

    the second question: at the moment i use the TextOutputFormatis the
    output format, which creates a file in the specified dfs directory:
    part-r-00000 so i can read id using ./bin/hadoop fs -cat
    /tmp/requests/part-r-00000 on the shell

    how can i get the path to this output file after my job is finished, to process it however... is there another way to collect results of a MR job, a text file is good for humans, but IMHO parsing a text file for results is not the preferable way...

    thanks in advance
    andre

    PS:
    versions:
    - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
    - hadoop-0.20.2-CDH3B4
    - hbase-0.90.1-CDH3B4
    - zookeeper-3.3.2-CDH3B4
  • Andre Reiter at Jun 24, 2011 at 9:48 pm
    Hi Doug,

    thanks a lot for your reply
    the point is clear hoe to create a job instance and to configure it using the TableMapReduceUtil.initTableMapperJob
    actually our job is working just perfectly, even the third party libs are simple to import using TableMapReduceUtil.addDependencyJars

    the problem is about the starting the MR job...

    at the moment we do it this way:
    - set HADOOP_CLASSPATH with hbase, zookeeper, and all third party jars
    - execute "./bin/hadoop jar /tmp/map_reduce_v1.jar package1.MRDriver1"

    that works like a charm, the question is now, how to start the job from our web application running on tomcat ???

    one option is may be to fork a new process, like this:
    ProcessBuilder pb = new ProcessBuilder("/opt/hadoop/bin/hadoop", "jar", "/tmp/map_reduce_v1.jar", "package1.MRDriver1");
    ...
    // configure ProcessBuilder
    Process p = processBuilder.start();

    but this does not seem to be very elegant to us... does it?

    so how to start a job from a running app, in the same process without forking

    andre



    Doug Meil wrote:
    Hi there-

    Take a look at this for starters...

    http://hbase.apache.org/book.html#mapreduce


    if you do job.waitForCompletion(true); it will execute synchronously. If you do job.waitForCompletion(false) it will fire and forget. A simple pattern is to spin off a thread where it executes job.waitFor..(true) and then you can pick up the results.


    -----Original Message-----
    From: Andre Reiter
    Sent: Friday, June 24, 2011 12:41 AM
    To: user@hbase.apache.org
    Subject: Re: Running MapReduce from a web application

    Hi everybody,

    no suggestiona about that questions?
    how to submit a MR out of my application, and not manually from a shell useing ./bin/hadoop jar ... ?

    best regards
    andre



    Andre Reiter wrote:
    now i would like to start MR jobs from my web application running on a tomcat, is there an elegant way to do it?

    the second question: at the moment i use the TextOutputFormatis the
    output format, which creates a file in the specified dfs directory:
    part-r-00000 so i can read id using ./bin/hadoop fs -cat
    /tmp/requests/part-r-00000 on the shell

    how can i get the path to this output file after my job is finished, to process it however... is there another way to collect results of a MR job, a text file is good for humans, but IMHO parsing a text file for results is not the preferable way...

    thanks in advance
    andre

    PS:
    versions:
    - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
    - hadoop-0.20.2-CDH3B4
    - hbase-0.90.1-CDH3B4
    - zookeeper-3.3.2-CDH3B4
  • Jonathan Holloway at Jun 24, 2011 at 10:12 pm
    Take a look at Yahoo's Oozie, it's fairly trivial to build a workflow for a map reduce job and submit it via the web service for processing, it's a lot easier than using ProcessBuilder also.

    Jon.
    On 24 Jun 2011, at 22:47, Andre Reiter wrote:

    Hi Doug,

    thanks a lot for your reply
    the point is clear hoe to create a job instance and to configure it using the TableMapReduceUtil.initTableMapperJob
    actually our job is working just perfectly, even the third party libs are simple to import using TableMapReduceUtil.addDependencyJars

    the problem is about the starting the MR job...

    at the moment we do it this way:
    - set HADOOP_CLASSPATH with hbase, zookeeper, and all third party jars
    - execute "./bin/hadoop jar /tmp/map_reduce_v1.jar package1.MRDriver1"

    that works like a charm, the question is now, how to start the job from our web application running on tomcat ???

    one option is may be to fork a new process, like this:
    ProcessBuilder pb = new ProcessBuilder("/opt/hadoop/bin/hadoop", "jar", "/tmp/map_reduce_v1.jar", "package1.MRDriver1");
    ...
    // configure ProcessBuilder
    Process p = processBuilder.start();

    but this does not seem to be very elegant to us... does it?

    so how to start a job from a running app, in the same process without forking

    andre



    Doug Meil wrote:
    Hi there-

    Take a look at this for starters...

    http://hbase.apache.org/book.html#mapreduce


    if you do job.waitForCompletion(true); it will execute synchronously. If you do job.waitForCompletion(false) it will fire and forget. A simple pattern is to spin off a thread where it executes job.waitFor..(true) and then you can pick up the results.


    -----Original Message-----
    From: Andre Reiter
    Sent: Friday, June 24, 2011 12:41 AM
    To: user@hbase.apache.org
    Subject: Re: Running MapReduce from a web application

    Hi everybody,

    no suggestiona about that questions?
    how to submit a MR out of my application, and not manually from a shell useing ./bin/hadoop jar ... ?

    best regards
    andre



    Andre Reiter wrote:
    now i would like to start MR jobs from my web application running on a tomcat, is there an elegant way to do it?

    the second question: at the moment i use the TextOutputFormatis the
    output format, which creates a file in the specified dfs directory:
    part-r-00000 so i can read id using ./bin/hadoop fs -cat
    /tmp/requests/part-r-00000 on the shell

    how can i get the path to this output file after my job is finished, to process it however... is there another way to collect results of a MR job, a text file is good for humans, but IMHO parsing a text file for results is not the preferable way...

    thanks in advance
    andre

    PS:
    versions:
    - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
    - hadoop-0.20.2-CDH3B4
    - hbase-0.90.1-CDH3B4
    - zookeeper-3.3.2-CDH3B4

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJun 22, '11 at 9:05a
activeJun 24, '11 at 10:12p
posts11
users5
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase