get speed, but at what cost? Also it's kind of like punting on making
HLog fast, and reduces the incentives to do so. I think that with a
single flush/sync per batch-put call will improve speed to the point
where running with WAL turned off will be of minimal value.
-ryan
On Tue, Apr 6, 2010 at 12:41 PM, Lars George wrote:
I agree with Jon here, parsing these files especially not having a
central logging is bad. I tried Splunk and that sort of worked as well
to quickly scan for exceptions. A problem were multiline stacktraces
(which they usually all are). They got mixed up when multiple servers
sent events at the same time. The Splunk data got all garbled then.
But something like that yeah.
Maybe with the new Multiput style stuff the WAL is not such a big
overhead anymore?
Lars
I agree with Jon here, parsing these files especially not having a
central logging is bad. I tried Splunk and that sort of worked as well
to quickly scan for exceptions. A problem were multiline stacktraces
(which they usually all are). They got mixed up when multiple servers
sent events at the same time. The Splunk data got all garbled then.
But something like that yeah.
Maybe with the new Multiput style stuff the WAL is not such a big
overhead anymore?
Lars
On Tue, Apr 6, 2010 at 7:12 PM, Jonathan Gray wrote:
I like this idea.
Putting major cluster events in some form into ZK. Could be used for jobs as Todd says. Can also be used as a cluster history report on web ui and such. Higher level historian.
I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened.
JG
I like this idea.
Putting major cluster events in some form into ZK. Could be used for jobs as Todd says. Can also be used as a cluster history report on web ui and such. Higher level historian.
I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines of logs to see what has happened.
JG
-----Original Message-----
From: Todd Lipcon
Sent: Tuesday, April 06, 2010 9:49 AM
To: [email protected]
Subject: Re: Should HTable.put() return a Future?
On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
wrote:
It might be useful to add a counter in ZK for region server crashes. If
the
master ever notices that a RS goes down, it increments it. Then we can
check
the before/after for a job and know when we might have lost some data.
-Todd
<[email protected]
edits
is
done,
in the middle of your job is undetectable :)
-Todd
<[email protected]>
data is
side as
cluster
the
to the
asynchronous IO
status
misguided
--
Todd Lipcon
Software Engineer, Cloudera
--
Todd Lipcon
Software Engineer, Cloudera
From: Todd Lipcon
Sent: Tuesday, April 06, 2010 9:49 AM
To: [email protected]
Subject: Re: Should HTable.put() return a Future?
On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
wrote:
Yes it is, you will be missing a RS ;)
How do you detect this, though?It might be useful to add a counter in ZK for region server crashes. If
the
master ever notices that a RS goes down, it increments it. Then we can
check
the before/after for a job and know when we might have lost some data.
-Todd
General rule when uploading without WAL is if there's a failure, the
job is screwed and that's the tradeoff for speed.
J-D
On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <[email protected]>
wrote:job is screwed and that's the tradeoff for speed.
J-D
On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <[email protected]>
On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
wrote:
The issue isn't with the write buffer here, it's the WAL. Your
are in the MemStore so as far as your clients can tell, the data
all persisted. In this case you would need to know when all the
memstores that contain your data are flushed... Best practice when
turning off WAL is force flushing the tables after the job is
memstores that contain your data are flushed... Best practice when
turning off WAL is force flushing the tables after the job is
else you can't guarantee durability for the last edits.
You still can't guarantee durability for any of the edits, since a failurein the middle of your job is undetectable :)
-Todd
J-D
On Tue, Apr 6, 2010 at 4:02 AM, Lars George
On Tue, Apr 6, 2010 at 4:02 AM, Lars George
wrote:
aHi,
I have an issue where I do bulk import and since WAL is off and
I have an issue where I do bulk import and since WAL is off and
default write buffer used (TableOutputFormat) I am running into
situations where the MR job completes successfully but not all
situations where the MR job completes successfully but not all
actually restored. The issue seems to be a failure on the RS
it cannot flush the write buffers because the MR overloads the
(usually the .META: hosting RS is the breaking point) or causes
underlying DFS to go slow and that repercussions all the way up
RS's.
My question is, would it make sense as with any other
My question is, would it make sense as with any other
to return a Future from the put() that will help checking the
of the actual server side async flush operation? Or am I
here? Please advise.
Lars
Lars
--
Todd Lipcon
Software Engineer, Cloudera
--
Todd Lipcon
Software Engineer, Cloudera