FAQ
Is anyone else having the same problems as me with regard to frequently
seeing "NotServingRegionException" and "IllegalStateException: region
offline" exceptions when trying to load data into an hbase instance?

My setup uses
- hadoop (2008-01-14 snapshot)
- a single server hbase cluster, as described in
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
- a single server hadoop cluster, as described in
http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
hardware as the hadoop cluster)
- client java application running on the same hardware

I have 1 table containing 4 column families, and I am attempting to
write about 5,000,000 rows of data, averaging at most few kb each, in a
single thread.

What happens then is that several tens of thousands of rows write
perfectly, and then the server I guess starts to split regions, and then
starts throwing the above mentioned exceptions all the time. The client
will then start waiting and retrying all the time and my upload rate
drops from several thousand rows per minute to less than 50 rows per
minute with a couple of errors (i.e. exceptions that made their way all
the way out to the client application) per minute. Regions, and
therefore hbase itself, seem to spend more time offline than on line.

Did anyone else have a similar experience, and does anyone have a
suggestion for how to improve the reliability of my setup?

- Marc

Search Discussions

  • Bryan Duxbury at Jan 24, 2008 at 8:36 pm
    When there are splits going on, NSREs are expected. I would say that
    it is fairly unexpected for them to bubble all the way up to the
    client application, though.

    Is there anything else in your master or regionserver logs? Are you
    running at DEBUG log level for HBase? I'd like to try and figure this
    one out if possible.

    I will say that I am running much bigger imports than what it sounds
    like you're doing, and it's working, albeit on a 13-node cluster, not
    a single machine. It's possible you're just trying to write too fast
    for your hardware to keep up, since it is playing every role, but I'd
    still expect it to keep working.

    -Bryan
    On Jan 24, 2008, at 12:23 PM, Marc Harris wrote:

    Is anyone else having the same problems as me with regard to
    frequently
    seeing "NotServingRegionException" and "IllegalStateException: region
    offline" exceptions when trying to load data into an hbase instance?

    My setup uses
    - hadoop (2008-01-14 snapshot)
    - a single server hbase cluster, as described in
    http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_
    (Single-Node_Cluster)
    - a single server hadoop cluster, as described in
    http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
    hardware as the hadoop cluster)
    - client java application running on the same hardware

    I have 1 table containing 4 column families, and I am attempting to
    write about 5,000,000 rows of data, averaging at most few kb each,
    in a
    single thread.

    What happens then is that several tens of thousands of rows write
    perfectly, and then the server I guess starts to split regions, and
    then
    starts throwing the above mentioned exceptions all the time. The
    client
    will then start waiting and retrying all the time and my upload rate
    drops from several thousand rows per minute to less than 50 rows per
    minute with a couple of errors (i.e. exceptions that made their way
    all
    the way out to the client application) per minute. Regions, and
    therefore hbase itself, seem to spend more time offline than on line.

    Did anyone else have a similar experience, and does anyone have a
    suggestion for how to improve the reliability of my setup?

    - Marc
  • Stack at Jan 24, 2008 at 9:38 pm
    I've seen the ISE's myself (HADOOP-2692). As Bryan says, the NSREs are
    part of 'normal' operation; they only show if running at DEBUG level
    unless we run out of retries and then the NSRE is thrown as an error.

    FYI, 5M rows single-threaded will take forever to load. I'd suggest you
    set up a MR job.

    Lots of splitting will have regions offline for a while and if all is
    running on the one server, could take a while for them to come back on
    (Digging in logs, you should be able to figure the story).

    Also of note, when a regionserver judges' itself overloaded, it'll block
    updates until its had chance to catch its breath. If these intervals go
    on too long, this could be another reason you're clients fail.

    St.Ack

    Bryan Duxbury wrote:
    When there are splits going on, NSREs are expected. I would say that
    it is fairly unexpected for them to bubble all the way up to the
    client application, though.

    Is there anything else in your master or regionserver logs? Are you
    running at DEBUG log level for HBase? I'd like to try and figure this
    one out if possible.

    I will say that I am running much bigger imports than what it sounds
    like you're doing, and it's working, albeit on a 13-node cluster, not
    a single machine. It's possible you're just trying to write too fast
    for your hardware to keep up, since it is playing every role, but I'd
    still expect it to keep working.

    -Bryan
    On Jan 24, 2008, at 12:23 PM, Marc Harris wrote:

    Is anyone else having the same problems as me with regard to frequently
    seeing "NotServingRegionException" and "IllegalStateException: region
    offline" exceptions when trying to load data into an hbase instance?

    My setup uses
    - hadoop (2008-01-14 snapshot)
    - a single server hbase cluster, as described in
    http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

    - a single server hadoop cluster, as described in
    http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
    hardware as the hadoop cluster)
    - client java application running on the same hardware

    I have 1 table containing 4 column families, and I am attempting to
    write about 5,000,000 rows of data, averaging at most few kb each, in a
    single thread.

    What happens then is that several tens of thousands of rows write
    perfectly, and then the server I guess starts to split regions, and then
    starts throwing the above mentioned exceptions all the time. The client
    will then start waiting and retrying all the time and my upload rate
    drops from several thousand rows per minute to less than 50 rows per
    minute with a couple of errors (i.e. exceptions that made their way all
    the way out to the client application) per minute. Regions, and
    therefore hbase itself, seem to spend more time offline than on line.

    Did anyone else have a similar experience, and does anyone have a
    suggestion for how to improve the reliability of my setup?

    - Marc
  • Marc Harris at Jan 25, 2008 at 4:22 pm
    To Byan's points:

    1) I've seen the discussion about the fact that exceptions can appear in
    the logs but mitigated by retry. I am only concerned with the fact that
    they bubble all the way up to the client. This is happening as a result
    exhausting the number of retries; there's nothing more sinister
    happening there.

    2) There does not appear to be anything else significant in the logs. I
    can send them to you if you like but I think my previous comment may
    cause you to be less interested.

    3) About success running on a 13 node cluster. I think that's really the
    question. Should I expect this data load to work reasonably well on a
    single node cluster or not?

    To stack's points:

    4) Could you explain what you mean by "forever to load"? During the
    phases it was working I would get about 100 rows per second, which was
    sufficient for me. Also could you explain why setting up a mapreduce job
    would make things more efficient in a single server setup? Are things
    not limited by disk access either way?

    5) When a regionserver judges itself overloaded and blocks updates, can
    another regionserver take up the load for all susequent updates, or do
    certain updates (based on row key presumably) have to go to that
    regionserver?

    - Marc
    On Thu, 2008-01-24 at 13:35 -0800, stack wrote:

    I've seen the ISE's myself (HADOOP-2692). As Bryan says, the NSREs are
    part of 'normal' operation; they only show if running at DEBUG level
    unless we run out of retries and then the NSRE is thrown as an error.

    FYI, 5M rows single-threaded will take forever to load. I'd suggest you
    set up a MR job.

    Lots of splitting will have regions offline for a while and if all is
    running on the one server, could take a while for them to come back on
    (Digging in logs, you should be able to figure the story).

    Also of note, when a regionserver judges' itself overloaded, it'll block
    updates until its had chance to catch its breath. If these intervals go
    on too long, this could be another reason you're clients fail.

    St.Ack

    Bryan Duxbury wrote:
    When there are splits going on, NSREs are expected. I would say that
    it is fairly unexpected for them to bubble all the way up to the
    client application, though.

    Is there anything else in your master or regionserver logs? Are you
    running at DEBUG log level for HBase? I'd like to try and figure this
    one out if possible.

    I will say that I am running much bigger imports than what it sounds
    like you're doing, and it's working, albeit on a 13-node cluster, not
    a single machine. It's possible you're just trying to write too fast
    for your hardware to keep up, since it is playing every role, but I'd
    still expect it to keep working.

    -Bryan
    On Jan 24, 2008, at 12:23 PM, Marc Harris wrote:

    Is anyone else having the same problems as me with regard to frequently
    seeing "NotServingRegionException" and "IllegalStateException: region
    offline" exceptions when trying to load data into an hbase instance?

    My setup uses
    - hadoop (2008-01-14 snapshot)
    - a single server hbase cluster, as described in
    http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

    - a single server hadoop cluster, as described in
    http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
    hardware as the hadoop cluster)
    - client java application running on the same hardware

    I have 1 table containing 4 column families, and I am attempting to
    write about 5,000,000 rows of data, averaging at most few kb each, in a
    single thread.

    What happens then is that several tens of thousands of rows write
    perfectly, and then the server I guess starts to split regions, and then
    starts throwing the above mentioned exceptions all the time. The client
    will then start waiting and retrying all the time and my upload rate
    drops from several thousand rows per minute to less than 50 rows per
    minute with a couple of errors (i.e. exceptions that made their way all
    the way out to the client application) per minute. Regions, and
    therefore hbase itself, seem to spend more time offline than on line.

    Did anyone else have a similar experience, and does anyone have a
    suggestion for how to improve the reliability of my setup?

    - Marc
  • Stack at Jan 25, 2008 at 6:45 pm

    Marc Harris wrote:
    To Byan's points: ...
    2) There does not appear to be anything else significant in the logs. I
    can send them to you if you like but I think my previous comment may
    cause you to be less interested.
    Send them to me if you don't mind. I'd look at them to see what was
    going on in the regionserver such that the client couldn't get a update
    in during a run of all the retries (I'd guess it to do with HADOOP-2712
    and HADOOP-2615).

    3) About success running on a 13 node cluster. I think that's really the
    question. Should I expect this data load to work reasonably well on a
    single node cluster or not?
    I don't know about 'reasonably well'. Single-node is sub-optimal but it
    should be possible to load it w/ a decent amount of data w/o failures.
    To stack's points:

    4) Could you explain what you mean by "forever to load"? During the
    phases it was working I would get about 100 rows per second, which was
    sufficient for me. Also could you explain why setting up a mapreduce job
    would make things more efficient in a single server setup? Are things
    not limited by disk access either way?
    Pardon me. I presumed multiple cores and was suggesting MR as one means
    of putting up multiple concurrent upload clients. Yeah, disk is a
    bottleneck.

    5) When a regionserver judges itself overloaded and blocks updates, can
    another regionserver take up the load for all susequent updates, or do
    certain updates (based on row key presumably) have to go to that
    regionserver?
    The latter.

    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 24, '08 at 8:25p
activeJan 25, '08 at 6:45p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase