FAQ
Hi ,
We use hbase in production(10 servers) hbase 0.90.2.

We are running map/reduce job on daily basis. Reducers are inserting
data to hbase.
insertion map/reduce job failed to insert data to hbase:
*here is the one of the region server's log : *
http://pastebin.com/raw.php?i=VF2bSMYd

Additional information and questions:
1) We disable automatic major compaction and run major compaction manually
,but from log file I got such log entry:

2011-08-07 20:57:20,706 INFO
org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 5
file(s), new file=hdfs://hadoop-
master.infolinks.local:8000/hbase/URLS/70c4ed1855cee6201e583662272f7a46/searches/6451756610532158137,
size=6.7m; total size for store is 6.7m

We started major compaction at 00:00 every day but this log entry time
is 20:57:20 , so how can I check that major compaction has been finished?
And what could be a reason for starting
2) There are a lot of exceptions like this:
2011-08-07 01:22:34,821 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 16 on 8041 caught: java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

what is this exception mean and is this a normal behaviour.
3) There are logs entry like this :
2011-08-07 17:14:05,833 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out: URLS,
20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a.
state=OFFLINE, ts=1312726415824
2011-08-07 17:14:05,833 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
for too long, reassigning URLS,
20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a.
to a random server
2011-08-07 17:14:05,833 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out: URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb.
state=OFFLINE, ts=1312726415824
2011-08-07 17:14:05,833 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
for too long, reassigning
URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb. to a random
server
2011-08-07 17:14:05,833 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out: URLS,20110731_gg,1312187408164.e7fa3b00af458db5af93d5c475712f62.
state=OFFLINE, ts=1312726415824

What does *Regions in transition timed out *means and is it correct
behaviour?
Thanks in advance
Oleg.




<http://pastebin.com/raw.php?i=VF2bSMYd>

Search Discussions

  • Stack at Aug 8, 2011 at 10:45 pm

    On Sun, Aug 7, 2011 at 12:28 PM, Oleg Ruchovets wrote:
    *here is the one of the region server's log : *
    http://pastebin.com/raw.php?i=VF2bSMYd
    I see this Oleg: "Caused by: java.lang.OutOfMemoryError: Java heap space"

    Additional information and questions:
    1) We disable automatic major  compaction and run major compaction manually
    ,but from log file I got such log entry:

    2011-08-07 20:57:20,706 INFO
    org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 5
    file(s), new file=hdfs://hadoop-
    master.infolinks.local:8000/hbase/URLS/70c4ed1855cee6201e583662272f7a46/searches/6451756610532158137,
    size=6.7m; total size for store is 6.7m
    A minor compaction can be promoted to major if it ends up picking all
    files compacting (see earlier in log it'll start of as an 'ordinary'
    compaction and then later become a 'major').
    We started major compaction at 00:00 every day but this log entry time
    is 20:57:20 , so how can I check that major compaction has been finished?
    The compaction is async. Currently no flag set on completion. This
    is an issue we need to figure an answer for.
    And what could be a reason for starting
    2) There are a lot of exceptions like this:
    2011-08-07 01:22:34,821 WARN org.apache.hadoop.ipc.HBaseServer: IPC
    Server handler 16 on 8041 caught: java.nio.channels.ClosedChannelException
    at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
    at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

    what is this exception mean and is this a normal behaviour.

    The client has given up listening. Do you see a corresponding timeout
    around same time on client-side?


    3)   There are logs entry like this :
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
    timed out:  URLS,
    20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a.
    state=OFFLINE, ts=1312726415824
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
    for too long, reassigning URLS,
    20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a.
    to a random server
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
    timed out:  URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb.
    state=OFFLINE, ts=1312726415824
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
    for too long, reassigning
    URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb. to a random
    server
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
    timed out:  URLS,20110731_gg,1312187408164.e7fa3b00af458db5af93d5c475712f62.
    state=OFFLINE, ts=1312726415824

    What does *Regions in transition timed out *means and is it correct
    behaviour?

    Does it go on without ever resolving? If so, this is not usually a
    good sign. Some recent issues have addressed this with fixes in
    0.90.4 which should be out soon (Check its release notes for related
    issues).

    St.Ack
  • Oleg Ruchovets at Aug 9, 2011 at 11:09 pm

    On Tue, Aug 9, 2011 at 1:44 AM, Stack wrote:
    On Sun, Aug 7, 2011 at 12:28 PM, Oleg Ruchovets wrote:
    *here is the one of the region server's log : *
    http://pastebin.com/raw.php?i=VF2bSMYd
    I see this Oleg: "Caused by: java.lang.OutOfMemoryError: Java heap space"
    Yes , I saw this too ,
    but I think this is not a root of the problem , it is a result of server
    being busy compacting the files and not able to handle insertion to hbase at
    the same time. Does it make sense?
    If not what is the way to get more details about this issue? I think about
    profiling , but we have 10 machine and I don't know which region server
    could get OutOfMemoryError. What is the best practice to profile the greed
    of hbase?


    Additional information and questions:
    1) We disable automatic major compaction and run major compaction manually
    ,but from log file I got such log entry:

    2011-08-07 20:57:20,706 INFO
    org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 5
    file(s), new file=hdfs://hadoop-
    master.infolinks.local:8000/hbase/URLS/70c4ed1855cee6201e583662272f7a46/searches/6451756610532158137,
    size=6.7m; total size for store is 6.7m
    A minor compaction can be promoted to major if it ends up picking all
    files compacting (see earlier in log it'll start of as an 'ordinary'
    compaction and then later become a 'major').
    Yeah exactly. I just think is it possible to disable minor compaction as we
    did with major?
    I found such configuration parameters:


    http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836>
    917

    */***917 <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#917>
    * * Algorithm to choose which files to compact*918
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#918>
    * **919 <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#919>
    * * Configuration knobs:*920
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#920>
    * * "hbase.hstore.compaction.ratio"*921
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#921>
    * * normal case: minor compact when file <= sum(smaller_files) *
    ratio*922 <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#922>
    * * "hbase.hstore.compaction.min.size"*923
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#923>
    * * unconditionally compact individual files below this size*924
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#924>
    * * "hbase.hstore.compaction.max.size"*925
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#925>
    * * never compact individual files above this size (unless
    splitting)*926 <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#926>
    * * "hbase.hstore.compaction.min"*927
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#927>
    * * min files needed to minor compact*928
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#928>
    * * "hbase.hstore.compaction.max"*929
    <http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#929>
    * * max files to compact at once (avoids OOM) *

    930<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#930>
    * **

    And what is the penalty or potential problem we can get
    disabling minor compaction(if it is possible of course).Our use case is that
    we inserting data on daily basis and creates predefined regions to avoid
    automatic splits.

    What is the additional tuning technique could be suitable for such hbase
    behavior?


    We started major compaction at 00:00 every day but this log entry time
    is 20:57:20 , so how can I check that major compaction has been
    finished?

    The compaction is async. Currently no flag set on completion. This
    is an issue we need to figure an answer for.
    And what could be a reason for starting
    2) There are a lot of exceptions like this:
    2011-08-07 01:22:34,821 WARN org.apache.hadoop.ipc.HBaseServer: IPC
    Server handler 16 on 8041 caught:
    java.nio.channels.ClosedChannelException
    at
    sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
    what is this exception mean and is this a normal behaviour.

    The client has given up listening. Do you see a corresponding timeout
    around same time on client-side?

    Where can I check it? our client is reducers of map/reduce job. All that I
    see from reducers logs is that it successfully connected to hbase. zookeper
    logs are pretty clean.

    3) There are logs entry like this :
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
    timed out: URLS,
    20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a
    .
    state=OFFLINE, ts=1312726415824
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
    for too long, reassigning URLS,
    20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a
    .
    to a random server
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
    timed out:
    URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb.
    state=OFFLINE, ts=1312726415824
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
    for too long, reassigning
    URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb. to a random
    server
    2011-08-07 17:14:05,833 INFO
    org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
    timed out:
    URLS,20110731_gg,1312187408164.e7fa3b00af458db5af93d5c475712f62.
    state=OFFLINE, ts=1312726415824

    What does *Regions in transition timed out *means and is it correct
    behaviour?

    Does it go on without ever resolving? If so, this is not usually a
    good sign. Some recent issues have addressed this with fixes in
    0.90.4 which should be out soon (Check its release notes for related
    issues).
    St.Ack
    >

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedAug 7, '11 at 7:29p
activeAug 9, '11 at 11:09p
posts3
users2
websitehbase.apache.org

2 users in discussion

Oleg Ruchovets: 2 posts Stack: 1 post

People

Translate

site design / logo © 2022 Grokbase