ok - hadn't thought about it that way - but yeah with a default of 1 -
the semantics seem correct.

under high load - some batching would automatically happen at this
setting (or so one would think - not sure if hdfs appends are blocked
on pending syncs (in which case the batching wouldn't quite happen i
think) - cc'ing Dhruba).

if the performance with setting of 1 doesn't work out - we may need an
option to delay acks until actual syncs .. (most likely we would be
able to compromise on latency to get higher throughput - but wouldn't
be willing to compromise on data integrity)
Hey Joydeep,

This is actually intended this way but the name of the variable is
misleading. The sync is done only if forceSync or we have enough
entries to sync (default is 1). If someone wants to sync only 100
entries for example, they would play with that configuration.

Hope that helps,


On Mon, Jan 11, 2010 at 3:46 PM, Joydeep Sarma wrote:

Hey HBase-devs,

we have been going through hbase code to come up to speed.

One of the questions was regarding the commit semantics. Thumbing through the RegionServer code that's appending to the wal:

syncWal -> HLog.sync -> addToSyncQueue ->syncDone.await()

and the log writer thread calls:

hflush(), syncDone.signalAll()

however hflush doesn't necessarily call a sync on the underlying log file:

if (this.forceSync ||
this.unflushedEntries.get() >= this.flushlogentries) { ... sync() ... }

so it seems that if forceSync is not true, the syncWal can unblock before a sync is called (and forcesync seems to be only true for metaregion()).

are we missing something - or is there a bug here (the signalAll should be conditional on hflush having actually flushed something).



Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 32 | next ›
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJan 11, '10 at 11:56p
activeJan 13, '10 at 10:56p



site design / logo © 2021 Grokbase