FAQ

[CouchDB-dev] Bulk Docs

Dean Landolt
Mar 12, 2009 at 11:42 pm

On Thu, Mar 12, 2009 at 6:38 PM, Antony Blakey wrote:

On 13/03/2009, at 1:46 AM, Damien Katz wrote:

Atomic bulk docs is in the patch, it just doesn't do conflict checking. If
any docs are conflicts, they are saved anyway as conflicts. This means it's
really for message queue functionality, not database consistency, your data
is safe and committed but might not be immediately available or consistent
between docs. The reasons we are removing all or nothing with conflict
checking as it doesn't work with replication (both offline and clustering)
as docs are not replicated in a single transaction or even in update order.
And getting it to work with partitioning would cause unacceptable write
performances. If we leave it, people will rely on the behavior not
understanding it doesn't really work with the rest of CouchDB.

So if you are currently using bulk docs to guarantee inter-document
consistency, it already doesn't work with replication. It only works on a
single machine, so no master-slave and no hot stand-by setup would work as
neither are guaranteed to be in a consistent state at any point.
The current bulk docs IS useful in a particular scenario.

It allows me, on a single node, to do transactional updates in response to
e.g. a web submit/AJAX call, without having to expose the conflict model to
the user and deal with conflicts in my single-node code.

I then have two distinct phases of operation for peers:

1. Replication is triggered by the user and they do nothing else until
replication commpletes, after which they have to resolve the conflicts
generated by replication. This code deals with conflicts and a resolution UI
and nothing else.

2. Normal operation - concurrent access by multiple applications, multiple
users. The code never sees a conflict, and hence the user interaction and
programming model is considerable simpler

There are a few additional features useful in this model, the principal
ones being either 1) the ability to roll back a partial replication to deal
with network failures; or b) the ability to maintain monotonic source writes
which ensures that each replication step is consistent. To date neither of
these features have gained sufficient community support to be considered.

I've presented this model before, and it has been rejected as being
incompatible with the initial couchdb intentions, but in response to Tim
Parkin, this is the reason for my fork. There are more details to my effort
- pure binary bodies rather than JSON, unification of attachments with
documents, strict metadata/content separation, map/reduce over arbitrary
data, generalised derivation, an immutable model of fully reified state,
replication of operations rather than data - but maybe anyone interested can
contact me offlist - it's no longer CouchDB and I'm sure everyone's sick of
saying/reading "forget it, it's not going to happen" :)

Will this code still be Apache? Meaning, will some of this features be able
to meander their way back into couch? I can totally understand the need for
a fork (differing goals sometimes cannot be reconciled), but if it's a
friendly fork, so to speak, everybody benefits -- especially if some of
these features get rolled back in to make it easier for you to keep up with
trunk otherwise.
reply

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions