FAQ
Hi All:
I am using Hbase 0.19.3 in cluster ( 1 master and 2 region nodes)

I keep getting these errors


009-08-19 02:40:27,142 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call getRow([B@b73d3
9, [B@1471c1, null, 9223372036854775807, 10000, -1) from 10.254.154.95:48659:
output error
2009-08-19 02:40:27,142 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020 caught: java
.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1119)
at org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:610)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:674)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:940)


Any idea?

-- Siva Jagadeesan

Search Discussions

  • Siva Jagadeesan at Aug 19, 2009 at 9:29 pm
    Hi All:
    After chatting with jdcryans in Irc channel, this is what we found out.

    We are having 40 to 50 Column families in some table. Hbase 19 opens 2 to 3
    X files than 20. Even though we have increased our file limit size , we are
    getting "Too Many files open" exception. So the client stops because of
    this exception. When client stops in middle you get ClosedChannelException in
    region servers.

    Hbase is better off with less column families . The suggestion was to prefix
    columns instead of column families.

    I have included the irc chat for your reference.

    Regards,

    -- Siva Jagadeesan

    sivajag_ <member:sivajag_>:Does anyone know about this error
    [11:13am]sivajag_ <member:sivajag_>:009-08-19 02:40:27,142 WARN
    org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
    getRow([B@b73d3
    [11:13am]sivajag_ <member:sivajag_>:9, [B@1471c1, null, 9223372036854775807,
    10000, -1) from 10.254.154.95:48659: output error
    [11:13am]sivajag_ <member:sivajag_>:2009-08-19 02:40:27,142 INFO
    org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020 caught:
    java
    [11:13am]sivajag_ <member:sivajag_>:.nio.channels.ClosedChannelException
    [11:13am]sivajag_ <member:sivajag_>:at
    sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
    [11:13am]sivajag_ <member:sivajag_>:at
    sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
    [11:13am]sivajag_ <member:sivajag_>:at
    org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1119)
    [11:13am]sivajag_ <member:sivajag_>:at
    org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(HBaseServer.java:70)
    [11:13am]sivajag_ <member:sivajag_>:at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:610)
    [11:13am]sivajag_ <member:sivajag_>:at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:674)
    [11:13am]sivajag_ <member:sivajag_>:at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:940)
    [11:14am]sivajag_ <member:sivajag_>:I am using Hbase 0.19.3 Cluster ( 1
    Master and 2 Region Nodes)
    [11:16am]jdcryans <member:jdcryans>:sivajag_ <member:sivajag_>: you are
    getting this in a region server?
    [11:17am]sivajag_ <member:sivajag_>:yes
    [11:18am]jdcryans <member:jdcryans>:seems that a client was doing a getRow
    and was somehow killed/stopped
    [11:19am]sivajag_ <member:sivajag_>:mmm
    [11:20am]sivajag_ <member:sivajag_>:It is a long running process that does
    getRow
    [11:20am]sivajag_ <member:sivajag_>:it ran fine for an hour .. than I
    started getting these errors
    [11:21am]jdcryans <member:jdcryans>:what do you get on the client side?
    [11:23am]sivajag_ <member:sivajag_>:2009-08-19 08:29:11:
    java.lang.reflect.UndeclaredThrowableException
    [11:23am]sivajag_ <member:sivajag_>:$Proxy0.findRootRegion(Unknown Source)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:770)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:515)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:515)
    [11:23am]sivajag_ <member:sivajag_>:
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
    [11:25am]jdcryans <member:jdcryans>:so you were only doing reads in hbase
    and all of a sudden that happened?
    [11:26am]sivajag_ <member:sivajag_>:we were doing reads from one table and
    write to another table
    [11:27am]jdcryans <member:jdcryans>:since you only have 3 nodes, could you
    pastbin the logs of the master and region servers? like 200-300 lines around
    when that happened
    [11:28am] RaySl <member:RaySl> joined the chat room.
    [11:28am]sivajag_ <member:sivajag_>:ok
    [11:28am]sivajag_ <member:sivajag_>:give me
    [11:29am]sivajag_ <member:sivajag_>:a minute
    [11:30am]jdcryans <member:jdcryans>:np
    [11:34am]sivajag_ <member:sivajag_>:http://pastie.org/588828
    [11:34am]sivajag_ <member:sivajag_>:i have pasted log info at
    http://pastie.org/588828
    [11:38am] rathore <member:rathore> joined the chat room.
    [11:38am]jdcryans <member:jdcryans>:sivajag_ <member:sivajag_>: there is
    nothing in the master log around 02:37:58?
    [11:39am]sivajag_ <member:sivajag_>:let me check
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:36:33,140 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan
    of 1 row(s) of meta region {regionname: -ROOT-,,0, startKey: <>, server:
    10.208.30.79:60020} complete
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:36:42,041 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
    scanning meta region {regionname: .META.,,1, startKey: <>, server:
    10.208.30.79:60020}
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:36:42,130 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan
    of 22 row(s) of meta region {regionname: .META.,,1, startKey: <>, server:
    10.208.30.79:60020} complete
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:36:42,130 INFO
    org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:37:33,115 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
    scanning meta region {regionname: -ROOT-,,0, startKey: <>, server:
    10.208.30.79:60020}
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:37:35,916 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan
    of 1 row(s) of meta region {regionname: -ROOT-,,0, startKey: <>, server:
    10.208.30.79:60020} complete
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:37:42,042 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
    scanning meta region {regionname: .META.,,1, startKey: <>, server:
    10.208.30.79:60020}
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:37:43,803 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan
    of 22 row(s) of meta region {regionname: .META.,,1, startKey: <>, server:
    10.208.30.79:60020} complete
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:37:43,804 INFO
    org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:38:33,116 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
    scanning meta region {regionname: -ROOT-,,0, startKey: <>, server:
    10.208.30.79:60020}
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:38:33,478 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan
    of 1 row(s) of meta region {regionname: -ROOT-,,0, startKey: <>, server:
    10.208.30.79:60020} complete
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:38:42,043 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
    scanning meta region {regionname: .META.,,1, startKey: <>, server:
    10.208.30.79:60020}
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:38:42,109 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan
    of 22 row(s) of meta region {regionname: .META.,,1, startKey: <>, server:
    10.208.30.79:60020} complete
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:38:42,109 INFO
    org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
    [11:39am]sivajag_ <member:sivajag_>:2009-08-19 02:39:33,117 INFO
    org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
    scanning meta region {regionname: -ROOT-,,0, startKey: <>, server:
    10.208.30.79:60020}
    [11:39am] You left the chat by being disconnected from the server.
    [11:40am] You rejoined the room.
    [11:42am]jdcryans <member:jdcryans>:sivajag_ <member:sivajag_>: it's better
    to pastebin every log
    [11:42am]jdcryans <member:jdcryans>:or you get easily kicked out
    [11:42am]jdcryans <member:jdcryans>:so is there any exception before that?
    [11:43am]sivajag_ <member:sivajag_>:hehe
    [11:43am]sivajag_ <member:sivajag_>:I will pastebin
    [11:43am]sivajag_ <member:sivajag_>:there is an exception in master
    [11:43am]sivajag_ <member:sivajag_>:let me pastebin
    [11:46am]sivajag_ <member:sivajag_>:pasted in the same place
    [11:46am]sivajag_ <member:sivajag_>:http://pastie.org/588828
    [11:48am]jdcryans <member:jdcryans>:and there is nothing else before that
    seems wrong?
    [11:48am]sivajag_ <member:sivajag_>:no
    [11:49am]jdcryans <member:jdcryans>:that's weird... haven't seen this type
    of error much...
    [11:49am]sivajag_ <member:sivajag_>:the problem I am having is I am not sure
    whether it is my hbase setup or our code doing something funcky
    [11:49am]sivajag_ <member:sivajag_>:too many moving wheels
    [11:50am]jdcryans <member:jdcryans>:well I've always seen these errors when
    clients were getting killed, almost never in another context
    [11:51am] nitay <member:nitay> joined the chat room.
    [11:52am]MrBradford <member:identifier:mrbradford>:wow, you guys are right.
    IntelliJ is niiiiice
    [11:53am]jdcryans <member:jdcryans>:sivajag_ <member:sivajag_>: anything
    else on the client side apart from the
    java.lang.reflect.UndeclaredThrowableException?
    [11:53am]sivajag_ <member:sivajag_>:let me check
    [11:57am]sivajag_ <member:sivajag_>:jdcryans <member:jdcryans>: another one
    [11:57am]sivajag_ <member:sivajag_>:http://pastie.org/588828
    [11:58am]jdcryans <member:jdcryans>:sivajag_ <member:sivajag_>: exception:
    java.io.IOException: Too many open files
    [12:00pm]jdcryans <member:jdcryans>:is this a 2.6.27 kernel?
    [12:00pm]jdcryans <member:jdcryans>:and did you change the nofile limit?
    [12:01pm]sivajag_ <member:sivajag_>:It is Fedora release 8 (Werewolf)
    [12:01pm]sivajag_ <member:sivajag_>:Kernel \r on an \m
    [12:01pm]sivajag_ <member:sivajag_>:i am not sure I did change the nofile
    limit
    [12:01pm]sivajag_ <member:sivajag_>:how do I check
    [12:02pm]sivajag_ <member:sivajag_>:what it is set currently
    [12:02pm]jgray <member:jgray>:ulimit -n
    [12:02pm]sivajag_ <member:sivajag_>:ulimit -n
    [12:02pm]sivajag_ <member:sivajag_>:32768
    [12:03pm]sivajag_ <member:sivajag_>:in all my three boxes
    [12:04pm]MrBradford <member:identifier:mrbradford>:I that that's correct.
    [12:05pm]jdcryans <member:jdcryans>:sivajag_ <member:sivajag_>: and uname
    -r?
    [12:05pm]sivajag_ <member:sivajag_>:uname -r
    [12:05pm]sivajag_ <member:sivajag_>:2.6.21.7-2.fc8xen
    [12:05pm]jdcryans <member:jdcryans>:mm
    [12:06pm]jdcryans <member:jdcryans>:I would be surprised that you hit your
    nofile limit and the epoll limit wasn't in 2.6.21...
    [12:06pm]jdcryans <member:jdcryans>:do you have tons of families per table?
    [12:07pm]sivajag_ <member:sivajag_>:40 to 50 families per table
    [12:07pm]sivajag_ <member:sivajag_>:some have only like 10
    [12:07pm]jgray <member:jgray>:that's a lot
    [12:07pm]jdcryans <member:jdcryans>:yep, that could be the problem
    [12:08pm]jdcryans <member:jdcryans>:why do you need so many families?
    [12:08pm]MrBradford <member:identifier:mrbradford>:Esp in .19, I'd imagine
    [12:08pm]jgray <member:jgray>:0.19 has 2-3X the number of open files as 0.20
    [12:09pm]sivajag_ <member:sivajag_>:we need that many families for our
    business solution
    [12:09pm]jgray <member:jgray>:are you sure?
    [12:10pm]jgray <member:jgray>:it's a bit high... more than 10-20 is really
    pushing it, even those are high
    [12:10pm]jgray <member:jgray>:each family is basically another table, but
    grouped by row key
    [12:10pm]rathore <member:rathore>:we're using column names as data... and
    flattening 2-level object graphs into single hbase rows
    [12:11pm]jgray <member:jgray>:could u prefix your column names
    [12:11pm]rathore <member:rathore>:which is why a couple of tables have as
    many as 30-40 column families
    [12:11pm]jgray <member:jgray>:to get grouping
    [12:12pm]rathore <member:rathore>:so ur saying try to simulate
    "column-families" with a column name prefix
    [12:12pm]jgray <member:jgray>:perhaps
    [12:13pm]rathore <member:rathore>:hmm
    [12:13pm]rathore <member:rathore>:so hbase cant really have too many column
    families
    [12:13pm]rathore <member:rathore>:i was under the impressions that i could
    have thousands
    [12:13pm]jgray <member:jgray>:there's nothing saying u can't have 50
    families, but it's beyond what we've targeted
    [12:13pm]rathore <member:rathore>:oh i see
    [12:13pm]jdcryans <member:jdcryans>:the practical limit would be 100
    [12:13pm]rathore <member:rathore>:ok
    [12:13pm]jgray <member:jgray>:it's expensive though, in every way
    [12:13pm]rathore <member:rathore>:and thousands of columns (instead of
    families?)
    [12:14pm]jdcryans <member:jdcryans>:but you would need hbase-0.20 for 100
    families
    [12:14pm]jgray <member:jgray>:yes, thousands of columns
    [12:14pm]rathore <member:rathore>:got it
    [12:14pm]jdcryans <member:jdcryans>:columns you can have millions
    [12:14pm]jgray <member:jgray>:0.19 supported 10s of thousands in a family
    [12:14pm]jgray <member:jgray>:0.20 supports millions
    [12:14pm]rathore <member:rathore>:so whats probably what our problem is
    [12:14pm]rathore <member:rathore>:ok
    [12:14pm]jdcryans <member:jdcryans>:btw you can check the number of open
    files with lsof | wc -l
    [12:15pm]rathore <member:rathore>:yes
    [12:15pm]rathore <member:rathore>:we're definitely seeing lots of open file
    handles
    [12:16pm]rathore <member:rathore>:thanks guys
    [12:17pm]rathore <member:rathore>:this was stupid design on our part then
    [12:17pm]jdcryans <member:jdcryans>:better to learn it now
    On Wed, Aug 19, 2009 at 11:08 AM, Siva Jagadeesan wrote:

    Hi All:
    I am using Hbase 0.19.3 in cluster ( 1 master and 2 region nodes)

    I keep getting these errors


    009-08-19 02:40:27,142 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call getRow([B@b73d3
    9, [B@1471c1, null, 9223372036854775807, 10000, -1) from
    10.254.154.95:48659: output error
    2009-08-19 02:40:27,142 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
    handler 4 on 60020 caught: java
    .nio.channels.ClosedChannelException
    at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1119)
    at org.apache.hadoop.hbase.ipc.HBaseServer.access$2000(HBaseServer.java:70)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:610)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:674)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:940)


    Any idea?

    -- Siva Jagadeesan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedAug 19, '09 at 6:08p
activeAug 19, '09 at 9:29p
posts2
users1
websitehbase.apache.org

1 user in discussion

Siva Jagadeesan: 2 posts

People

Translate

site design / logo © 2022 Grokbase