Grokbase Groups HBase dev April 2010
FAQ
Hi,

I have an issue where I do bulk import and since WAL is off and a
default write buffer used (TableOutputFormat) I am running into
situations where the MR job completes successfully but not all data is
actually restored. The issue seems to be a failure on the RS side as
it cannot flush the write buffers because the MR overloads the cluster
(usually the .META: hosting RS is the breaking point) or causes the
underlying DFS to go slow and that repercussions all the way up to the
RS's.

My question is, would it make sense as with any other asynchronous IO
to return a Future from the put() that will help checking the status
of the actual server side async flush operation? Or am I misguided
here? Please advise.

Lars

Search Discussions

  • Jean-Daniel Cryans at Apr 6, 2010 at 4:31 pm
    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars
  • Todd Lipcon at Apr 6, 2010 at 4:37 pm

    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans wrote:

    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Lars George at Apr 6, 2010 at 4:43 pm
    That is my issue, you sort of fire and forget about the updates. Even
    flushing the writes will not help as far as I see it. If you have a
    server fail in the process of persisting its memstored data the error
    is not sent back to the caller. Only a deep log file analysis may
    reveal the issue, but even telling what is missing will be difficult
    as all you see is an IOE?
    On Tue, Apr 6, 2010 at 6:36 PM, Todd Lipcon wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans wrote:

    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Jean-Daniel Cryans at Apr 6, 2010 at 4:47 pm
    Yes it is, you will be missing a RS ;)

    General rule when uploading without WAL is if there's a failure, the
    job is screwed and that's the tradeoff for speed.

    J-D
    On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans wrote:

    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Todd Lipcon at Apr 6, 2010 at 4:47 pm

    On Tue, Apr 6, 2010 at 9:43 AM, Lars George wrote:

    That is my issue, you sort of fire and forget about the updates. Even
    flushing the writes will not help as far as I see it. If you have a
    server fail in the process of persisting its memstored data the error
    is not sent back to the caller. Only a deep log file analysis may
    reveal the issue, but even telling what is missing will be difficult
    as all you see is an IOE?
    Agreed, if you do log analysis and determine there was a failure during your
    upload, you have to start over, essentially.

    Actually writing and flushing a WAL is the only way to get actual correct
    behavior.

    -Todd

    On Tue, Apr 6, 2010 at 6:36 PM, Todd Lipcon wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans <jdcryans@apache.org
    wrote:
    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Jean-Daniel Cryans at Apr 6, 2010 at 4:48 pm
    Then if you can't tolerate the low durability level, you probably
    should use the WAL?

    J-D
    On Tue, Apr 6, 2010 at 9:43 AM, Lars George wrote:
    That is my issue, you sort of fire and forget about the updates. Even
    flushing the writes will not help as far as I see it. If you have a
    server fail in the process of persisting its memstored data the error
    is not sent back to the caller. Only a deep log file analysis may
    reveal the issue, but even telling what is missing will be difficult
    as all you see is an IOE?
    On Tue, Apr 6, 2010 at 6:36 PM, Todd Lipcon wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans wrote:

    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Todd Lipcon at Apr 6, 2010 at 4:50 pm

    On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans wrote:

    Yes it is, you will be missing a RS ;)
    How do you detect this, though?

    It might be useful to add a counter in ZK for region server crashes. If the
    master ever notices that a RS goes down, it increments it. Then we can check
    the before/after for a job and know when we might have lost some data.

    -Todd

    General rule when uploading without WAL is if there's a failure, the
    job is screwed and that's the tradeoff for speed.

    J-D
    On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans <jdcryans@apache.org
    wrote:
    The issue isn't with the write buffer here, it's the WAL. Your edits
    are in the MemStore so as far as your clients can tell, the data is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D
    On Tue, Apr 6, 2010 at 4:02 AM, Lars George wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all data is
    actually restored. The issue seems to be a failure on the RS side as
    it cannot flush the write buffers because the MR overloads the cluster
    (usually the .META: hosting RS is the breaking point) or causes the
    underlying DFS to go slow and that repercussions all the way up to the
    RS's.

    My question is, would it make sense as with any other asynchronous IO
    to return a Future from the put() that will help checking the status
    of the actual server side async flush operation? Or am I misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Jonathan Gray at Apr 6, 2010 at 5:17 pm
    I like this idea.

    Putting major cluster events in some form into ZK. Could be used for jobs as Todd says. Can also be used as a cluster history report on web ui and such. Higher level historian.

    I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened.

    JG
    -----Original Message-----
    From: Todd Lipcon
    Sent: Tuesday, April 06, 2010 9:49 AM
    To: hbase-dev@hadoop.apache.org
    Subject: Re: Should HTable.put() return a Future?

    On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
    wrote:
    Yes it is, you will be missing a RS ;)
    How do you detect this, though?

    It might be useful to add a counter in ZK for region server crashes. If
    the
    master ever notices that a RS goes down, it increments it. Then we can
    check
    the before/after for a job and know when we might have lost some data.

    -Todd

    General rule when uploading without WAL is if there's a failure, the
    job is screwed and that's the tradeoff for speed.

    J-D
    On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
    <jdcryans@apache.org
    wrote:
    The issue isn't with the write buffer here, it's the WAL. Your
    edits
    are in the MemStore so as far as your clients can tell, the data
    is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is
    done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D

    On Tue, Apr 6, 2010 at 4:02 AM, Lars George
    <lars.george@gmail.com>
    wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and
    a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all
    data is
    actually restored. The issue seems to be a failure on the RS
    side as
    it cannot flush the write buffers because the MR overloads the
    cluster
    (usually the .META: hosting RS is the breaking point) or causes
    the
    underlying DFS to go slow and that repercussions all the way up
    to the
    RS's.

    My question is, would it make sense as with any other
    asynchronous IO
    to return a Future from the put() that will help checking the
    status
    of the actual server side async flush operation? Or am I
    misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Lars George at Apr 6, 2010 at 7:42 pm
    I agree with Jon here, parsing these files especially not having a
    central logging is bad. I tried Splunk and that sort of worked as well
    to quickly scan for exceptions. A problem were multiline stacktraces
    (which they usually all are). They got mixed up when multiple servers
    sent events at the same time. The Splunk data got all garbled then.
    But something like that yeah.

    Maybe with the new Multiput style stuff the WAL is not such a big
    overhead anymore?

    Lars
    On Tue, Apr 6, 2010 at 7:12 PM, Jonathan Gray wrote:
    I like this idea.

    Putting major cluster events in some form into ZK.  Could be used for jobs as Todd says.  Can also be used as a cluster history report on web ui and such.  Higher level historian.

    I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened.

    JG
    -----Original Message-----
    From: Todd Lipcon
    Sent: Tuesday, April 06, 2010 9:49 AM
    To: hbase-dev@hadoop.apache.org
    Subject: Re: Should HTable.put() return a Future?

    On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
    wrote:
    Yes it is, you will be missing a RS ;)
    How do you detect this, though?

    It might be useful to add a counter in ZK for region server crashes. If
    the
    master ever notices that a RS goes down, it increments it. Then we can
    check
    the before/after for a job and know when we might have lost some data.

    -Todd

    General rule when uploading without WAL is if there's a failure, the
    job is screwed and that's the tradeoff for speed.

    J-D

    On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
    <jdcryans@apache.org
    wrote:
    The issue isn't with the write buffer here, it's the WAL. Your
    edits
    are in the MemStore so as far as your clients can tell, the data
    is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is
    done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D

    On Tue, Apr 6, 2010 at 4:02 AM, Lars George
    <lars.george@gmail.com>
    wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and
    a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all
    data is
    actually restored. The issue seems to be a failure on the RS
    side as
    it cannot flush the write buffers because the MR overloads the
    cluster
    (usually the .META: hosting RS is the breaking point) or causes
    the
    underlying DFS to go slow and that repercussions all the way up
    to the
    RS's.

    My question is, would it make sense as with any other
    asynchronous IO
    to return a Future from the put() that will help checking the
    status
    of the actual server side async flush operation? Or am I
    misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Paul Smith at Apr 7, 2010 at 12:50 am
    gents, I'm working on something in the background that may certainly help here.

    Right now we (Aconex) have open sourced a performance library called Parfait (ASL 2.0 licensed, see [1]) for Java, which is generic, but initially targetted at writing data out to SGI's open-sourced Performance Co-Pilot (PCP) compatible format (see [2])

    PCP is designed for monitoring WTF goes on in large clusters (see [3], hopefully that's big enough for you...). It integrates application, OS and hardware metrics together to get a holistic view of what is going on. It logs data into archives, allowing one to retrospectively go over an event and analyse it. One can build inference rules then to test theories and then run the rules over archives to look back in time to see if you had similar events. The rules can also be used to check live data to trigger things like Nagios alarms. We use this extensively at Aconex. I could not possibly live without it. Hard data gets to the bottom quickly. Looking through all those log4j log lines would do my head in, I tip my hat to you for trying. PCP is totally cross platform (Linux, Mac, Windows).

    Parfait can poll JMX counters, or counters can be invoked direct. I'm working on a MetricContext that exports all HBase and Hadoop JMX counters into Parfait. The goal is to be able to have PCP visualize data more effectively for HBase/Hadoop clusters. To give an example of what sort of visualization I'd love to have for HBase & Hadoop see a simple working pic of 3d visualisation at [4] below, that's basic, but imagine a 3D vis of all the HBase region servers showing visualizations of Hbase specific metrics, played back in real time, or retrospectively at any pace you want.

    I originally posted this back to Hadoop back in September (see [5]), but no-one seemed that interested which is a bit weird. had planned to make more progress on this but the Mavenization got in the way.

    At any rate, I'd like to further discuss what requirements for analysing these types of problems you think you need. The log4j logging is good (great for some types of basic analysis) but I think we can do better, and I think Parfait & PCP could really help you guys in production a LOT..

    [1] Parfait - http://code.google.com/p/parfait/
    [2] PCP - http://oss.sgi.com/projects/pcp/
    [3] NASA's SGI Columbia Supercomputer - http://www.nas.nasa.gov/News/Images/Images/AC04-0208-9.jpg
    [4] Clusterviz - http://people.apache.org/~psmith/clustervis.png
    [5] Original Hadoop mail - http://markmail.org/search/?q=3D%20Cluster%20Performance%20Visualization#query:3D%20Cluster%20Performance%20Visualization+page:1+mid:4t52nnla4snntwow+state:results
    On 07/04/2010, at 5:41 AM, Lars George wrote:

    I agree with Jon here, parsing these files especially not having a
    central logging is bad. I tried Splunk and that sort of worked as well
    to quickly scan for exceptions. A problem were multiline stacktraces
    (which they usually all are). They got mixed up when multiple servers
    sent events at the same time. The Splunk data got all garbled then.
    But something like that yeah.

    Maybe with the new Multiput style stuff the WAL is not such a big
    overhead anymore?

    Lars
    On Tue, Apr 6, 2010 at 7:12 PM, Jonathan Gray wrote:
    I like this idea.

    Putting major cluster events in some form into ZK. Could be used for jobs as Todd says. Can also be used as a cluster history report on web ui and such. Higher level historian.

    I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened.

    JG
    -----Original Message-----
    From: Todd Lipcon
    Sent: Tuesday, April 06, 2010 9:49 AM
    To: hbase-dev@hadoop.apache.org
    Subject: Re: Should HTable.put() return a Future?

    On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
    wrote:
    Yes it is, you will be missing a RS ;)
    How do you detect this, though?

    It might be useful to add a counter in ZK for region server crashes. If
    the
    master ever notices that a RS goes down, it increments it. Then we can
    check
    the before/after for a job and know when we might have lost some data.

    -Todd

    General rule when uploading without WAL is if there's a failure, the
    job is screwed and that's the tradeoff for speed.

    J-D

    On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
    <jdcryans@apache.org
    wrote:
    The issue isn't with the write buffer here, it's the WAL. Your
    edits
    are in the MemStore so as far as your clients can tell, the data
    is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is
    done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D

    On Tue, Apr 6, 2010 at 4:02 AM, Lars George
    <lars.george@gmail.com>
    wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and
    a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all
    data is
    actually restored. The issue seems to be a failure on the RS
    side as
    it cannot flush the write buffers because the MR overloads the
    cluster
    (usually the .META: hosting RS is the breaking point) or causes
    the
    underlying DFS to go slow and that repercussions all the way up
    to the
    RS's.

    My question is, would it make sense as with any other
    asynchronous IO
    to return a Future from the put() that will help checking the
    status
    of the actual server side async flush operation? Or am I
    misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Ryan Rawson at Apr 7, 2010 at 12:58 am
    Generally I can't agree that turning off the WAL is a good idea. You
    get speed, but at what cost? Also it's kind of like punting on making
    HLog fast, and reduces the incentives to do so. I think that with a
    single flush/sync per batch-put call will improve speed to the point
    where running with WAL turned off will be of minimal value.

    -ryan

    On Tue, Apr 6, 2010 at 12:41 PM, Lars George wrote:
    I agree with Jon here, parsing these files especially not having a
    central logging is bad. I tried Splunk and that sort of worked as well
    to quickly scan for exceptions. A problem were multiline stacktraces
    (which they usually all are). They got mixed up when multiple servers
    sent events at the same time. The Splunk data got all garbled then.
    But something like that yeah.

    Maybe with the new Multiput style stuff the WAL is not such a big
    overhead anymore?

    Lars
    On Tue, Apr 6, 2010 at 7:12 PM, Jonathan Gray wrote:
    I like this idea.

    Putting major cluster events in some form into ZK.  Could be used for jobs as Todd says.  Can also be used as a cluster history report on web ui and such.  Higher level historian.

    I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened.

    JG
    -----Original Message-----
    From: Todd Lipcon
    Sent: Tuesday, April 06, 2010 9:49 AM
    To: hbase-dev@hadoop.apache.org
    Subject: Re: Should HTable.put() return a Future?

    On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
    wrote:
    Yes it is, you will be missing a RS ;)
    How do you detect this, though?

    It might be useful to add a counter in ZK for region server crashes. If
    the
    master ever notices that a RS goes down, it increments it. Then we can
    check
    the before/after for a job and know when we might have lost some data.

    -Todd

    General rule when uploading without WAL is if there's a failure, the
    job is screwed and that's the tradeoff for speed.

    J-D

    On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
    <jdcryans@apache.org
    wrote:
    The issue isn't with the write buffer here, it's the WAL. Your
    edits
    are in the MemStore so as far as your clients can tell, the data
    is
    all persisted. In this case you would need to know when all the
    memstores that contain your data are flushed... Best practice when
    turning off WAL is force flushing the tables after the job is
    done,
    else you can't guarantee durability for the last edits.
    You still can't guarantee durability for any of the edits, since a failure
    in the middle of your job is undetectable :)

    -Todd

    J-D

    On Tue, Apr 6, 2010 at 4:02 AM, Lars George
    <lars.george@gmail.com>
    wrote:
    Hi,

    I have an issue where I do bulk import and since WAL is off and
    a
    default write buffer used (TableOutputFormat) I am running into
    situations where the MR job completes successfully but not all
    data is
    actually restored. The issue seems to be a failure on the RS
    side as
    it cannot flush the write buffers because the MR overloads the
    cluster
    (usually the .META: hosting RS is the breaking point) or causes
    the
    underlying DFS to go slow and that repercussions all the way up
    to the
    RS's.

    My question is, would it make sense as with any other
    asynchronous IO
    to return a Future from the put() that will help checking the
    status
    of the actual server side async flush operation? Or am I
    misguided
    here? Please advise.

    Lars


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Paul Smith at Apr 7, 2010 at 1:11 am
    Parfait can poll JMX counters, or counters can be invoked direct. I'm working on a MetricContext that exports all HBase and Hadoop JMX counters into Parfait. The goal is to be able to have PCP visualize data more effectively for HBase/Hadoop clusters. To give an example of what sort of visualization I'd love to have for HBase & Hadoop see a simple working pic of 3d visualisation at [4] below, that's basic, but imagine a 3D vis of all the HBase region servers showing visualizations of Hbase specific metrics, played back in real time, or retrospectively at any pace you want.
    btw we also export all the JVM metrics here too, GC activity (rates, times spent, for both major and minor GC's), class compilations, memory segment sizes (heap, perm gen, code area etc).

    if HBase metrics like compactions and splits etc were exported into PCP one could see the impact across hardware (cpu, Virtual memory, disk) with JVM level stuff (heap sizes and GC) correlating with HBase activity.

    Parfait can also collect metrics on a per-thread (ThreadLocal) to allow individual request collection. For example, right now in production we can see for every request (a Controller/Servlet) this sort of data in our log files:

    [2010-04-07 11:06:28,569 INFO ][EventMetricCollector][http-2001-Processor85 g7pfur4y][59.167.192.26][228349] Top ViewCorrespondenceControl ViewCorrespondenceControl Elapsed time: own 3113ms, total 3117ms Total CPU: own 10ms, total 30ms User CPU: own 10ms, total 20ms System CPU: own 10ms, total 10ms Blocked count: own 0, total 0 Blocked time: own 0ms, total 0ms Wait count: own 0, total 0 Wait time: own 0ms, total 0ms Database execution time: own 3050ms, total 3050ms Database execution count: own 12, total 12 total Database CPU time: own 0, total 0 Error Pages: own 0, total 1

    I'd hope that through a similar mechanism I could instrument the HBase Scan costs of a particular User activity and see how many rows were read over, and how many cell values were picked out for a single request. This allows us to narrow in quickly on which activity (controller action) or which users are consuming the most of a certain resource, find out why and fix it. We import our PCP data into a datawarehouse for longer term capacity planning too.

    anyway, some more ideas to kick around and discuss.

    Paul

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedApr 6, '10 at 11:02a
activeApr 7, '10 at 1:11a
posts13
users6
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase