When updating documents must the entire document be resent or just the changed fields?
Michael
[CouchDB-user] Document Updates
| Tweet |
|
Search Discussions
-
Paul Davis at Nov 13, 2008 at 4:22 pm ⇧
-
Michael Ramirez at Nov 13, 2008 at 4:30 pm ⇧
Will this cause bandwidth issues when updating large documents if only a single field changes. I am afraid that as my documents grow larger my app gets slower.
Michael
----- Original Message ----
From: Paul Davis <paul.joseph.davis@gmail.com>
To: couchdb-user@incubator.apache.org
Sent: Thursday, November 13, 2008 9:22:11 AM
Subject: Re: Document Updates
The entire document.
On Thu, Nov 13, 2008 at 11:20 AM, Michael Ramirez
wrote:When updating documents must the entire document be resent or just the changed fields?
Michael -
Noah Slater at Nov 13, 2008 at 4:38 pm ⇧
I for one am interested to hear JSON diff proposals. I think this would make aOn Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
Will this cause bandwidth issues when updating large documents if only a
single field changes. I am afraid that as my documents grow larger my app gets
slower.
great addition to CouchDB. As best I can tell, this should really be done as an
external standardisation effort so the whole community could benifit. I don't
think using JavaScript to set the document attributes is a very good solution to
this. An entirely new Media Type is needed, IMHO.
-
Ayende Rahien at Nov 13, 2008 at 4:41 pm ⇧
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which are
critical to diffing correctly.On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater wrote:On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:I for one am interested to hear JSON diff proposals. I think this would
Will this cause bandwidth issues when updating large documents if only a
single field changes. I am afraid that as my documents grow larger my app gets
slower.
make a
great addition to CouchDB. As best I can tell, this should really be done
as an
external standardisation effort so the whole community could benifit. I
don't
think using JavaScript to set the document attributes is a very good
solution to
this. An entirely new Media Type is needed, IMHO.
--
Noah Slater, http://tumbolia.org/nslater -
Noah Slater at Nov 13, 2008 at 4:44 pm ⇧
It's a bit more complex than that, canonicalised JSON is still in it's infancy,On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which are
critical to diffing correctly.
so we would have to get the community to adopt that first. I know that people
have been discussing JSON diffs before, may be worth looking up what's already
been done on this.
-
Paul Davis at Nov 13, 2008 at 7:07 pm ⇧
I think the JSON diff is a great idea. Unfortunately, the RFC is a bitOn Thu, Nov 13, 2008 at 11:43 AM, Noah Slater wrote:On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:It's a bit more complex than that, canonicalised JSON is still in it's infancy,
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which are
critical to diffing correctly.
so we would have to get the community to adopt that first. I know that people
have been discussing JSON diffs before, may be worth looking up what's already
been done on this.
--
Noah Slater, http://tumbolia.org/nslater
worrisome in one respect:
Section 2.2:
'The names within an object SHOULD be unique.'
I think that this could be a pretty big stumbling block if different
parsers start taking different interpretations of that and I've alread
seen implementations do slightly different things with repeated field
names.
Contemplating the canonical spec I think I would prefer updating the
JSON spec's use of SHOULD to MUST. The canonical thing to me seems
more like a normalization method as opposed to a hard and fast spec.
Specifically I could see lots of wasted cycles spent on keeping
canonization when it's needed relatively infrequently.
Assuming the change to MUST, I could probably write and implement a
first draft of the spec in a day.
How hard can it be to change an RFC?
Paul
-
Noah Slater at Nov 13, 2008 at 9:04 pm ⇧
-
Paul Davis at Nov 13, 2008 at 9:14 pm ⇧
On Thu, Nov 13, 2008 at 4:04 PM, Noah Slater wrote:On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:I hope that's humour! :p
How hard can it be to change an RFC?
It's only *one* word! :D
--
Noah Slater, http://tumbolia.org/nslater -
Noah Slater at Nov 13, 2008 at 9:32 pm ⇧
Heh. Well, I've been waiting for about three months for a simple update to theOn Thu, Nov 13, 2008 at 04:13:44PM -0500, Paul Davis wrote:On Thu, Nov 13, 2008 at 4:04 PM, Noah Slater wrote:It's only *one* word! :DOn Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:I hope that's humour! :p
How hard can it be to change an RFC?
Atom Relationships record with IANA. I doubt the EETF is any faster. In
addition, I should imagine it would have to be a revised standard, and these
things take ages. If you wanna do it, and get your name on the RFC, well...
I still think it's best to see what other effort has been done in this area.
-
Antony Blakey at Nov 13, 2008 at 9:50 pm ⇧
Given the XML/JSON isomorphism, I wonder if something like this: http://www.springerlink.com/content/r1t6h8631868k615/On 14/11/2008, at 3:13 AM, Noah Slater wrote:On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:It's a bit more complex than that, canonicalised JSON is still in
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which
are
critical to diffing correctly.
it's infancy,
so we would have to get the community to adopt that first. I know
that people
have been discussing JSON diffs before, may be worth looking up
what's already
been done on this.
would be a good start for computing a diff.
The relevent section from XQuery Update, http://www.w3.org/TR/xquery-update-10/#id-update-primitives
, might be useful starting point for defining a JSON-encoded
(recursive) EDL-based structural diff.
IME a structural diff is better for these purposes than a traditional
text-diff over a canonicalized format. It's certainly easier to
generate for clients.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
On the other side, you have the customer and/or user, and they tend to
do what we call "automating the pain." They say, "What is it we're
doing now? How would that look if we automated it?" Whereas, what the
design process should properly be is one of saying, "What are the
goals we're trying to accomplish and how can we get rid of all this
task crap?"
-- Alan Cooper
-
Noah Slater at Nov 13, 2008 at 10:03 pm ⇧
I think an XQuery/XPath type solution would be very interesting.On Fri, Nov 14, 2008 at 08:19:22AM +1030, Antony Blakey wrote:
The relevent section from XQuery Update,
http://www.w3.org/TR/xquery-update-10/#id-update-primitives, might be useful
starting point for defining a JSON-encoded (recursive) EDL-based structural
diff.
-
Antony Blakey at Nov 13, 2008 at 10:39 pm ⇧
IMO the simplest thing that would work (ignoring representation) looksOn 14/11/2008, at 8:33 AM, Noah Slater wrote:On Fri, Nov 14, 2008 at 08:19:22AM +1030, Antony Blakey wrote:I think an XQuery/XPath type solution would be very interesting.
The relevent section from XQuery Update,
http://www.w3.org/TR/xquery-update-10/#id-update-primitives, might
be useful
starting point for defining a JSON-encoded (recursive) EDL-based
structural
diff.
something like this:
insert <json> in <jsonpath>
insert <json> after <jsonpath>
insert <json> before <jsonpath>
delete <jsonpath>
replace <jsonpath> with <json>
where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
without the executable expressions.
Diff computation would undoubtedly generate a restricted subset of
jsonpath selectors, but it's worth supporting the wildcard/recursive
descent operations for clients.
Representing the update document as json itself would be clean, so an
EDL could look like this:
[
{ "replace":"$.post.comments[2].email"
"with":"antony@linkuistics.com" },
{ "insert": { "email":.... } "in": "$.post.comments" }
{ "insert": { "email":.... } "after": "$.post.comments[5]" }
]
or, using a meta-encoding (which IMO is unneccessary)
[
{ "op":"replace" "path":"$.post.comments[2].email" "content":"antony@linkuistics.com
" },
{ "op":"insert-in" "path":"$.post.comments" "content":
{ "email":.... } }
{ "op":"insert-after" "path":"$.post.comments[5]" "content":
{ "email":.... } }
]
I propose that these aren't declarative, but procedural, in the sense
that they are applied linearly and hence each path context is the
result of the proceeding edits, rather than the original tree. This
complicates the encoding of diffs but results in a much simpler apply
mechanism. But maybe it would be worth using a declarative form with a
constant context - I'm unsure about the tradeoffs.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
He who would make his own liberty secure, must guard even his enemy
from repression.
-- Thomas Paine
-
Noah Slater at Nov 13, 2008 at 11:41 pm ⇧
Hmm, this seems pretty cool.On Fri, Nov 14, 2008 at 09:09:03AM +1030, Antony Blakey wrote:
IMO the simplest thing that would work (ignoring representation) looks
something like this:
insert <json> in <jsonpath>
insert <json> after <jsonpath>
insert <json> before <jsonpath>
delete <jsonpath>
replace <jsonpath> with <json>
where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
without the executable expressions.
I did some digging to see what else is out there:
* http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
* http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
* http://www.snellspace.com/wp/?p=895
* http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
* http://www.snellspace.com/wp/?p=902
I think I agree that if this was to hit CouchDB it should be done via the PATCH
method, makes the most sense given the context.
-
Adam Kocoloski at Nov 14, 2008 at 12:03 am ⇧
Partial updates are also a very popular discussion topic on the
restful-json Google group:
http://groups.google.com/group/restful-json
But AFAIK no consensus has been reached yet. Best, AdamOn Nov 13, 2008, at 6:37 PM, Noah Slater wrote:On Fri, Nov 14, 2008 at 09:09:03AM +1030, Antony Blakey wrote:Hmm, this seems pretty cool.
IMO the simplest thing that would work (ignoring representation)
looks
something like this:
insert <json> in <jsonpath>
insert <json> after <jsonpath>
insert <json> before <jsonpath>
delete <jsonpath>
replace <jsonpath> with <json>
where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
without the executable expressions.
I did some digging to see what else is out there:
* http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
* http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-
updates/
* http://www.snellspace.com/wp/?p=895
* http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/
0316.html
* http://www.snellspace.com/wp/?p=902
I think I agree that if this was to hit CouchDB it should be done
via the PATCH
method, makes the most sense given the context.
--
Noah Slater, http://tumbolia.org/nslater -
Chris Anderson at Nov 14, 2008 at 12:10 am ⇧
Forgive me for throwing out a loose-cannon idea, but would it beOn Thu, Nov 13, 2008 at 3:37 PM, Noah Slater wrote:
I did some digging to see what else is out there:
* http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
* http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
* http://www.snellspace.com/wp/?p=895
* http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
* http://www.snellspace.com/wp/?p=902
easiest to provide an API where the user sends a Javascript function
to CouchDB via the PATCH method? The function could look something
like:
function(doc) {
doc.my_field = "new value";
doc.existing_array[3] = "another new value";
doc.new_array = ["a", "b", 3];
return doc;
}
-
Antony Blakey at Nov 14, 2008 at 12:32 am ⇧
I thought that javascript wasn't part of the Couch core? JSON isn'tOn 14/11/2008, at 10:39 AM, Chris Anderson wrote:On Thu, Nov 13, 2008 at 3:37 PM, Noah Slater wrote:Forgive me for throwing out a loose-cannon idea, but would it be
I did some digging to see what else is out there:
* http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
* http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
* http://www.snellspace.com/wp/?p=895
* http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
* http://www.snellspace.com/wp/?p=902
easiest to provide an API where the user sends a Javascript function
to CouchDB via the PATCH method? The function could look something
like:
function(doc) {
doc.my_field = "new value";
doc.existing_array[3] = "another new value";
doc.new_array = ["a", "b", 3];
return doc;
}
javascript, and all uses of javascript *could* be replaced with e.g.
Ruby (or my interest, Smalltalk), which is why there is a "language"
attribute on the views.
Your proposal would change that.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
75% of statistics are made up on the spot.
-
Antony Blakey at Nov 14, 2008 at 12:49 am ⇧
On 14/11/2008, at 11:01 AM, Antony Blakey wrote:You could use the view mechanism, and attach a "language" attribute,On 14/11/2008, at 10:39 AM, Chris Anderson wrote:I thought that javascript wasn't part of the Couch core? JSON isn't
On Thu, Nov 13, 2008 at 3:37 PM, Noah Slater <nslater@apache.org>
wrote:I did some digging to see what else is out there:Forgive me for throwing out a loose-cannon idea, but would it be
* http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
* http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
* http://www.snellspace.com/wp/?p=895
* http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
* http://www.snellspace.com/wp/?p=902
easiest to provide an API where the user sends a Javascript function
to CouchDB via the PATCH method? The function could look something
like:
function(doc) {
doc.my_field = "new value";
doc.existing_array[3] = "another new value";
doc.new_array = ["a", "b", 3];
return doc;
}
javascript, and all uses of javascript *could* be replaced with e.g.
Ruby (or my interest, Smalltalk), which is why there is a "language"
attribute on the views.
Your proposal would change that.
and have this be a general transformation interface, which would
indeed be very nice. For efficiency you would want to apply this over
sets of documents, and probably in a transactional context like bulk
update does now.
However... Damien wants something to use in replication, which would
mean that javascript would then become a required, rather than an
optional part of Couch, because replication would require it (unless
you made the replication diff generator pluggable ... but why go
there?). The benefit of the declarative diff format is that applying a
diff can be done within Couch.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
It's amazing that the one side of the conversation that survived is "I
don't know art, but I know what I like". The reply from the artist was
"Madam, so does a cow".
-- Carl Kirkendall
-
Ara.t.howard at Nov 14, 2008 at 1:02 am ⇧
couldn't these queries run in the view server? in fact any mechanismOn Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
You could use the view mechanism, and attach a "language" attribute,
and have this be a general transformation interface, which would
indeed be very nice. For efficiency you would want to apply this
over sets of documents, and probably in a transactional context like
bulk update does now.
However... Damien wants something to use in replication, which would
mean that javascript would then become a required, rather than an
optional part of Couch, because replication would require it (unless
you made the replication diff generator pluggable ... but why go
there?). The benefit of the declarative diff format is that applying
a diff can be done within Couch.
which would allow the view server could accomplish this with a
protocol between it and the db server. basically it's an addition to
the map/reduce functionality which would alter documents on the fly.
a @ http://codeforpeople.com/--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama -
Chris Anderson at Nov 14, 2008 at 1:22 am ⇧
Antony's right the currently replication does not depend on theOn Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:couldn't these queries run in the view server? in fact any mechanism which
You could use the view mechanism, and attach a "language" attribute, and
have this be a general transformation interface, which would indeed be very
nice. For efficiency you would want to apply this over sets of documents,
and probably in a transactional context like bulk update does now.
However... Damien wants something to use in replication, which would mean
that javascript would then become a required, rather than an optional part
of Couch, because replication would require it (unless you made the
replication diff generator pluggable ... but why go there?). The benefit of
the declarative diff format is that applying a diff can be done within
Couch.
would allow the view server could accomplish this with a protocol between it
and the db server. basically it's an addition to the map/reduce
functionality which would alter documents on the fly.
availability of the view server. And I think it is smart to avoid that
dependence, when possible.
Alas, my attempt to bypass all the craziness that is canonical JSON,
has come short of that. Oh wells...
-
Antony Blakey at Nov 14, 2008 at 1:41 am ⇧
My proposal doesn't require canonical JSON because it is structuralOn 14/11/2008, at 11:50 AM, Chris Anderson wrote:
Alas, my attempt to bypass all the craziness that is canonical JSON,
has come short of that. Oh wells...
rather than textual. That's one reason I think it's a good approach.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
One should respect public opinion insofar as is necessary to avoid
starvation and keep out of prison, but anything that goes beyond this
is voluntary submission to an unnecessary tyranny.
-- Bertrand Russell
-
Paul Davis at Nov 14, 2008 at 2:34 am ⇧
I don't think we need canonical JSON.
The Spec definitely needs to be disambiguated though. As I see it
there are two interpretations:
1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok
It doesn't matter which is chosen, but one of them must be to make this work.
Also, I got bored. So I implemented JSON diff in python for Case #2.
http://www.davispj.com/svn/projects/json-diff/json-diff.py
I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
library and then pound the diff thing with it.
Not sure if it's obvious or not, but switching from case 2 to 1 is
straightforward. Also, my current array diff implementation is kinda
whack. And indels screw the rest of the diff, as in, its not so much a
diff as a delete rest and add new. Getting this optimal is actually an
N^2 runtime algorithm via dynamic programming (smith-waterman style)
Also, do note that the erlang parser and python (and i assume ruby is
in the python boat) have different behaviors in respect to the 2
cases. Erlang is Case 1, python is case 2.
PaulOn Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson wrote:On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:Antony's right the currently replication does not depend on theOn Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:couldn't these queries run in the view server? in fact any mechanism which
You could use the view mechanism, and attach a "language" attribute, and
have this be a general transformation interface, which would indeed be very
nice. For efficiency you would want to apply this over sets of documents,
and probably in a transactional context like bulk update does now.
However... Damien wants something to use in replication, which would mean
that javascript would then become a required, rather than an optional part
of Couch, because replication would require it (unless you made the
replication diff generator pluggable ... but why go there?). The benefit of
the declarative diff format is that applying a diff can be done within
Couch.
would allow the view server could accomplish this with a protocol between it
and the db server. basically it's an addition to the map/reduce
functionality which would alter documents on the fly.
availability of the view server. And I think it is smart to avoid that
dependence, when possible.
Alas, my attempt to bypass all the craziness that is canonical JSON,
has come short of that. Oh wells...
--
Chris Anderson
http://jchris.mfdz.com -
Antony Blakey at Nov 14, 2008 at 2:53 am ⇧
Given that JSON is executable Javascript, 2 is the only interpretationOn 14/11/2008, at 1:04 PM, Paul Davis wrote:
I don't think we need canonical JSON.
The Spec definitely needs to be disambiguated though. As I see it
there are two interpretations:
1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok
that allows for roundtrip equivalence.Not sure if it's obvious or not, but switching from case 2 to 1 isThe algorithm described here: http://www.springerlink.com/content/r1t6h8631868k615/
straightforward. Also, my current array diff implementation is kinda
whack. And indels screw the rest of the diff, as in, its not so much a
diff as a delete rest and add new. Getting this optimal is actually an
N^2 runtime algorithm via dynamic programming (smith-waterman style)
is O(n), and although it isn't optimal, I'm guessing the performance
stability makes up for it.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Did you hear about the Buddhist who refused Novocain during a root
canal?
His goal: transcend dental medication.
-
Noah Slater at Nov 14, 2008 at 4:06 am ⇧
I fear this is rather a large jump to conclusion.On Fri, Nov 14, 2008 at 01:22:58PM +1030, Antony Blakey wrote:Given that JSON is executable Javascript, 2 is the only interpretation
1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok
that allows for roundtrip equivalence.
-
Antony Blakey at Nov 14, 2008 at 7:02 am ⇧
I'm only claiming that *roundtrip equivalence* is only possible if youOn 14/11/2008, at 2:36 PM, Noah Slater wrote:I fear this is rather a large jump to conclusion.On Fri, Nov 14, 2008 at 01:22:58PM +1030, Antony Blakey wrote:Given that JSON is executable Javascript, 2 is the only
1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok
interpretation
that allows for roundtrip equivalence.
don't allow duplicate keys, because JSON being a serialization format
for (limited) Javascript data structures, cannot be generated with
duplicate keys from those data structures.
Being executable, it's interpretation is defined by the Javascript
spec. JSON is a serialization of a (limited) Javascript data
structure. Javascript hashes don't allow for duplicate keys, nor do
they (AFAIR) provide any ordering guarantees. I contend that the
semantics of JSON follow from this, and as an extension I wonder if
any JSON that isn't a serialization of some Javascript data structure
should not be valid JSON. OTOH, the operational interpretation would
suggest that maybe the text representation can have duplicate keys,
but that the data structure that it represents (which is what we
should care about) does not.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Some defeats are instalments to victory.
-- Jacob Riis
-
Paul Davis at Nov 15, 2008 at 1:38 am ⇧
I wrote a fuzz thing to go along with the diff testing.
You can get it with:
$ sudo easy_install jsontools
# Examples
from StringIO import StringIO
import jsontools
stream = StringIO()
//Fuzzy objects
fj = jsontools.FuzzyJson()
obj1 = fj.generate(1).next()
obj2 = fj.modify(obj1)
//Diff the objects
jsontools.jsondiff(obj1, obj2, stream=stream)
//Apply the diff
stream.seek(0)
result = jsontools.jsonapply(stream, obj1)
//Compare them
assert jsontools.jsoncmp(result, obj2) == 2
Any comments?
PaulOn Thu, Nov 13, 2008 at 9:34 PM, Paul Davis wrote:
I don't think we need canonical JSON.
The Spec definitely needs to be disambiguated though. As I see it
there are two interpretations:
1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok
It doesn't matter which is chosen, but one of them must be to make this work.
Also, I got bored. So I implemented JSON diff in python for Case #2.
http://www.davispj.com/svn/projects/json-diff/json-diff.py
I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
library and then pound the diff thing with it.
Not sure if it's obvious or not, but switching from case 2 to 1 is
straightforward. Also, my current array diff implementation is kinda
whack. And indels screw the rest of the diff, as in, its not so much a
diff as a delete rest and add new. Getting this optimal is actually an
N^2 runtime algorithm via dynamic programming (smith-waterman style)
Also, do note that the erlang parser and python (and i assume ruby is
in the python boat) have different behaviors in respect to the 2
cases. Erlang is Case 1, python is case 2.
PaulOn Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson wrote:On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:Antony's right the currently replication does not depend on theOn Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:couldn't these queries run in the view server? in fact any mechanism which
You could use the view mechanism, and attach a "language" attribute, and
have this be a general transformation interface, which would indeed be very
nice. For efficiency you would want to apply this over sets of documents,
and probably in a transactional context like bulk update does now.
However... Damien wants something to use in replication, which would mean
that javascript would then become a required, rather than an optional part
of Couch, because replication would require it (unless you made the
replication diff generator pluggable ... but why go there?). The benefit of
the declarative diff format is that applying a diff can be done within
Couch.
would allow the view server could accomplish this with a protocol between it
and the db server. basically it's an addition to the map/reduce
functionality which would alter documents on the fly.
availability of the view server. And I think it is smart to avoid that
dependence, when possible.
Alas, my attempt to bypass all the craziness that is canonical JSON,
has come short of that. Oh wells...
--
Chris Anderson
http://jchris.mfdz.com -
Paul Davis at Nov 15, 2008 at 1:42 am ⇧
On Fri, Nov 14, 2008 at 8:37 PM, Paul Davis wrote:
I wrote a fuzz thing to go along with the diff testing.
You can get it with:
$ sudo easy_install jsontools
# Examples
from StringIO import StringIO
import jsontools
stream = StringIO()
//Fuzzy objects
fj = jsontools.FuzzyJson()
obj1 = fj.generate(1).next()
obj2 = fj.modify(obj1)
//Diff the objects
jsontools.jsondiff(obj1, obj2, stream=stream)
//Apply the diff
stream.seek(0)
result = jsontools.jsonapply(stream, obj1)
//Compare them
assert jsontools.jsoncmp(result, obj2) == 2 == True
Any comments?
PaulOn Thu, Nov 13, 2008 at 9:34 PM, Paul Davis wrote:
I don't think we need canonical JSON.
The Spec definitely needs to be disambiguated though. As I see it
there are two interpretations:
1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok
It doesn't matter which is chosen, but one of them must be to make this work.
Also, I got bored. So I implemented JSON diff in python for Case #2.
http://www.davispj.com/svn/projects/json-diff/json-diff.py
I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
library and then pound the diff thing with it.
Not sure if it's obvious or not, but switching from case 2 to 1 is
straightforward. Also, my current array diff implementation is kinda
whack. And indels screw the rest of the diff, as in, its not so much a
diff as a delete rest and add new. Getting this optimal is actually an
N^2 runtime algorithm via dynamic programming (smith-waterman style)
Also, do note that the erlang parser and python (and i assume ruby is
in the python boat) have different behaviors in respect to the 2
cases. Erlang is Case 1, python is case 2.
PaulOn Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson wrote:On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:Antony's right the currently replication does not depend on theOn Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:couldn't these queries run in the view server? in fact any mechanism which
You could use the view mechanism, and attach a "language" attribute, and
have this be a general transformation interface, which would indeed be very
nice. For efficiency you would want to apply this over sets of documents,
and probably in a transactional context like bulk update does now.
However... Damien wants something to use in replication, which would mean
that javascript would then become a required, rather than an optional part
of Couch, because replication would require it (unless you made the
replication diff generator pluggable ... but why go there?). The benefit of
the declarative diff format is that applying a diff can be done within
Couch.
would allow the view server could accomplish this with a protocol between it
and the db server. basically it's an addition to the map/reduce
functionality which would alter documents on the fly.
availability of the view server. And I think it is smart to avoid that
dependence, when possible.
Alas, my attempt to bypass all the craziness that is canonical JSON,
has come short of that. Oh wells...
--
Chris Anderson
http://jchris.mfdz.com -
Ayende Rahien at Nov 14, 2008 at 4:18 am ⇧
Take into account that the view server is explicitly a separate
process.Requiring
it to process incoming request would create a very high overhead.On Fri, Nov 14, 2008 at 3:02 AM, ara.t.howard wrote:
On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
You could use the view mechanism, and attach a "language" attribute, andhave this be a general transformation interface, which would indeed be verycouldn't these queries run in the view server? in fact any mechanism which
nice. For efficiency you would want to apply this over sets of documents,
and probably in a transactional context like bulk update does now.
However... Damien wants something to use in replication, which would mean
that javascript would then become a required, rather than an optional part
of Couch, because replication would require it (unless you made the
replication diff generator pluggable ... but why go there?). The benefit of
the declarative diff format is that applying a diff can be done within
Couch.
would allow the view server could accomplish this with a protocol between it
and the db server. basically it's an addition to the map/reduce
functionality which would alter documents on the fly.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama -
Ara.t.howard at Nov 14, 2008 at 4:53 am ⇧
i'm just saying - if people could write javascript to execute on theOn Nov 13, 2008, at 9:18 PM, Ayende Rahien wrote:
Take into account that the view server is explicitly a separate
process.Requiring
it to process incoming request would create a very high overhead.
server people would really be singing hallelujah. from my perspective
it's only about 10000000% as cool as being to plugin a different
language in as a view server.
2 cts.
a @ http://codeforpeople.com/--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama -
Paul Davis at Nov 14, 2008 at 5:07 am ⇧
The JS PATCH function is definitely an interesting idea, but I think
is not at all going to fix the issue. How many use cases are going to
be able to reduce an update operation to a function?
Most use cases that I see are of the nature: "Download JSON,
deserialize to native language, mutate, serialize, send to server".
Anyone that wants write an abstract "mutate" -> JS function thing has
props in my book.
It is forseable that you could treat it as a stored procedure though.
As in the function signature becomes "function(doc, input)" and input
is a complex JSON object that is used in the updated. This almost has
actual use in terms of the earlier thread on transaction semantics
that I assume would be implementable. But really the transaction idea
is a whole new can of worms in terms of couch would have to be allowed
to request multiple documents in one transaction etc.
I think related to Noah's original sentiment of getting some diff
system set up way outside of couch in JSON land that's implementable
in any language for anything is the best bet.
PaulOn Thu, Nov 13, 2008 at 11:53 PM, ara.t.howard wrote:On Nov 13, 2008, at 9:18 PM, Ayende Rahien wrote:i'm just saying - if people could write javascript to execute on the server
Take into account that the view server is explicitly a separate
process.Requiring
it to process incoming request would create a very high overhead.
people would really be singing hallelujah. from my perspective it's only
about 10000000% as cool as being to plugin a different language in as a view
server.
2 cts.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being better.
simply reflect on that.
h.h. the 14th dalai lama -
Ara.t.howard at Nov 14, 2008 at 12:37 am ⇧
yeah - i loooooooove this idea!On Nov 13, 2008, at 5:09 PM, Chris Anderson wrote:
Forgive me for throwing out a loose-cannon idea, but would it be
easiest to provide an API where the user sends a Javascript function
to CouchDB via the PATCH method? The function could look something
like:
function(doc) {
doc.my_field = "new value";
doc.existing_array[3] = "another new value";
doc.new_array = ["a", "b", 3];
return doc;
}
--
a @ http://codeforpeople.com/--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama -
Antony Blakey at Nov 14, 2008 at 12:33 am ⇧
My only concern with PATCH is that it isn't HTTP 1.1. When I wasOn 14/11/2008, at 10:07 AM, Noah Slater wrote:
I did some digging to see what else is out there:
* http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
* http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-
updates/
* http://www.snellspace.com/wp/?p=895
* http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/
0316.html
* http://www.snellspace.com/wp/?p=902
I think I agree that if this was to hit CouchDB it should be done
via the PATCH
method, makes the most sense given the context.
working with WebDAV we continually had problems with proxies that
didn't deal with non-HTTP-1.1 methods.
IMO the only other alternative is a POST with either a content-type,
although I feel uneasy about that if the content actually *is* JSON,
or a query parameter.
Alternatively you could use a POST to an extended URL, but that
interferes with attachments. And as I understand it, that would only
truly qualify as REST if it was included in a document e.g.
discoverable rather than specified.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Did you hear about the Buddhist who refused Novocain during a root
canal?
His goal: transcend dental medication.
-
Antony Blakey at Nov 14, 2008 at 2:03 am ⇧
On 14/11/2008, at 10:07 AM, Noah Slater wrote:
I think I agree that if this was to hit CouchDB it should be done
via the PATCH
method, makes the most sense given the context.
You would want to allow partial updates in a bulk operation, so any
packaging would need to be usable in that context as well. Given
updates need to be handled separately, maybe deletions should be as
well.
{
"docs": [
/* Just for backwards compatibility ... but does that matter for
an alpha product ? */
... as now ...
],
"PUT": [
/* As now with docs, but not allowing "delete":true ? */
{ "_id": ..., "_rev": ..., ... }
...
],
"PATCH": [
{ "_id": ..., "_rev": ..., deltas: [ { "replace":... }, ... ] }
...
],
"DELETE": [
{ "_id": ..., "_rev": ... },
...
]
}
This has the benefit of (roughly) representing the HTTP methods that
it aggregates.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
It is no measure of health to be well adjusted to a profoundly sick
society.
-- Jiddu Krishnamurti
-
Antony Blakey at Nov 14, 2008 at 2:22 am ⇧
On second thought, given that it represents an aggregation of commandsOn 14/11/2008, at 12:32 PM, Antony Blakey wrote:
{
"docs": [
/* Just for backwards compatibility ... but does that matter for
an alpha product ? */
... as now ...
],
"PUT": [
/* As now with docs, but not allowing "delete":true ? */
{ "_id": ..., "_rev": ..., ... }
...
],
"PATCH": [
{ "_id": ..., "_rev": ..., deltas: [ { "replace":... }, ... ] }
...
],
"DELETE": [
{ "_id": ..., "_rev": ... },
...
]
}
This has the benefit of (roughly) representing the HTTP methods that
it aggregates.
that have an explicit ordering, maybe it shouldn't be grouped by
method but instead use the method as a key. Like this:
[
{ "delete":{ "_id": ..., "_rev": ... } },
{ "put": { "_id": ..., "_rev": ..., ... },
{ "patch": { "_id": ..., "_rev": ... } "with": [ { "replace": ...
"with": ... }, ... ] },
...
]
The benefit of this that generating this is easier to reason about and
generate if your client code is doing deletes and inserts of documents
with the same id. It accurately represents adding a transactional
boundary without requiring a change in semantics.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
When I hear somebody sigh, 'Life is hard,' I am always tempted to ask,
'Compared to what?'
-- Sydney Harris
-
Noah Slater at Nov 14, 2008 at 4:03 am ⇧
We shouldn't be tunneling verbs though media types, this is antithetical to theOn Fri, Nov 14, 2008 at 12:32:18PM +1030, Antony Blakey wrote:
You would want to allow partial updates in a bulk operation, so any
packaging would need to be usable in that context as well. Given updates
need to be handled separately, maybe deletions should be as well. ...
"PUT": [ ...
"PATCH": [ ...
"DELETE": [
principals of REST and would harm all manner of possible intermediary clients.
-
Antony Blakey at Nov 14, 2008 at 7:05 am ⇧
I'm not tunneling verbs, I'm just re-using the names of the methodsOn 14/11/2008, at 2:32 PM, Noah Slater wrote:On Fri, Nov 14, 2008 at 12:32:18PM +1030, Antony Blakey wrote:We shouldn't be tunneling verbs though media types, this is
You would want to allow partial updates in a bulk operation, so any
packaging would need to be usable in that context as well. Given
updates
need to be handled separately, maybe deletions should be as well. ...
"PUT": [ ...
"PATCH": [ ...
"DELETE": [
antithetical to the
principals of REST and would harm all manner of possible
intermediary clients.
that would normally be used as selectors. I wasn't implying anything
more than that.
Couch's bulk operation already has this issue. You delete a document
using the DELETE verb, yet in a bulk operation you set the "_deleted"
special attribute. That is in effect tunneling the DELETE, using a
different representation, within a POST.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies.
-- C. A. R. Hoare
-
Noah Slater at Nov 14, 2008 at 11:57 am ⇧
A RESTful system should work by exchanging representations of resources.On Fri, Nov 14, 2008 at 05:35:03PM +1030, Antony Blakey wrote:
I'm not tunneling verbs, I'm just re-using the names of the methods that would
normally be used as selectors. I wasn't implying anything more than that. ...
You delete a document using the DELETE verb, yet in a bulk operation you set
the "_deleted" special attribute. That is in effect tunneling the DELETE,
using a different representation, within a POST.
As best I understand it, if you want to modify a resource in a way that is not a
direct update, move or delete you should use a separate media type, something
like application/diff+json if it existed. A JSON diff could include ways to
delete and update multiple documents at the same time, a bit like UNIX diff is
able to specify filenames. This could be used for single or bulk updates.
Of course, this feels very similar to your original proposal, which leaves me a
little confused. Throwing about JSON with keys such as "POST" and "DELETE" feels
very RPC-like. Perhaps the difference is the use of a separate media type.
I am eager to be corrected by any resident RESTafarians. For me, REST is a bit
like Zen. Sometimes I think I understand it totally, and other times I'm
convinced that I don't understand it at all.
Best,
-
Antony Blakey at Nov 14, 2008 at 12:23 pm ⇧
Yes, a content-type was something I suggested, but it didn't seemOn 14/11/2008, at 10:23 PM, Noah Slater wrote:On Fri, Nov 14, 2008 at 05:35:03PM +1030, Antony Blakey wrote:A RESTful system should work by exchanging representations of
I'm not tunneling verbs, I'm just re-using the names of the methods
that would
normally be used as selectors. I wasn't implying anything more than
that. ...
You delete a document using the DELETE verb, yet in a bulk
operation you set
the "_deleted" special attribute. That is in effect tunneling the
DELETE,
using a different representation, within a POST.
resources.
As best I understand it, if you want to modify a resource in a way
that is not a
direct update, move or delete you should use a separate media type,
something
like application/diff+json if it existed. A JSON diff could include
ways to
delete and update multiple documents at the same time, a bit like
UNIX diff is
able to specify filenames. This could be used for single or bulk
updates.
right. In a strictly RESTful sense, maybe it does make sense however.Of course, this feels very similar to your original proposal, whichThese items, such as 'post' and 'delete' can be equated to the
leaves me a
little confused. Throwing about JSON with keys such as "POST" and
"DELETE" feels
very RPC-like. Perhaps the difference is the use of a separate media
type.
'replace'/'insert' et al of my diff proposal, but operating over
documents rather than JSON trees. The fact that they are so named was
*purely* an attempt on my part to make it obvious what equivalent
(singular) resource operation (identified by HTTP method) was
equivalent to that document-level operation, and was in no way an
attempt to tunnel the HTTP mechanism.
The current way _bulk_docs does deletion doesn't feel right. I do
think there should be some isomorphism between the _bulk_docs
structure and the operations one would do without using the _bulk_docs
mechanism, hence my suggestion (but the second temporal ordering, not
the first operation-type ordering).I am eager to be corrected by any resident RESTafarians. For me,I don't think Couch is truly REST. Certainly _bulk_docs isn't. The
REST is a bit
like Zen. Sometimes I think I understand it totally, and other times
I'm
convinced that I don't understand it at all.
fact that there are URI patterns means it's not REST, at least not if
I've understood Roy's recent communications/frustrations, such as http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
.
In particular, point 4 seems to disqualify any system, (including
Couch) that needs the documents in the "Reference" section of the Wiki.
To be REST it has to be just like the web. Using links discovered from
documents, never constructing them according to some scheme.
However, what does it matter? REST certainly is a slippery sucker, but
that may be because we want it to be more generally applicable than it
is. Couch doesn't have to be REST, and I suspect that it in fact
cannot be.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Man will never be free until the last king is strangled with the
entrails of the last priest.
-- Denis Diderot
-
Noah Slater at Nov 14, 2008 at 12:36 pm ⇧
Sure, there are some areas, such as hypertext as the engine of applicationOn Fri, Nov 14, 2008 at 10:52:35PM +1030, Antony Blakey wrote:I don't think Couch is truly REST. Certainly _bulk_docs isn't. The fact that
I am eager to be corrected by any resident RESTafarians. For me, REST is a
bit like Zen. Sometimes I think I understand it totally, and other times I'm
convinced that I don't understand it at all.
there are URI patterns means it's not REST, at least not if I've understood
Roy's recent communications/frustrations, such as
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.
In particular, point 4 seems to disqualify any system, (including Couch) that
needs the documents in the "Reference" section of the Wiki.
To be REST it has to be just like the web. Using links discovered from
documents, never constructing them according to some scheme.
However, what does it matter? REST certainly is a slippery sucker, but that
may be because we want it to be more generally applicable than it is. Couch
doesn't have to be REST, and I suspect that it in fact cannot be.
state, that CouchDB does not use, but looking back at Roy's original doctoral
thesis, REST seems to be predominantly about architecture constraints, of which
this was not one of them. CouchDB embraces all of the mentioned constraints in
some way or another; namely client/server, statelessness, cacheability, uniform
interfaces, and layered systems.
So, I guess RESTful or non-RESTful is a false dichotomy in this respect.
Additionally, I agree with you on the state of current bulk operations. I think
there is room for improvement, and hopefully some kind of differential update
could be possible at the same time.
-
Noah Slater at Nov 14, 2008 at 12:41 pm ⇧
Careless wording on my part, hypertext is clearly an architecturalOn Fri, Nov 14, 2008 at 12:35:45PM +0000, Noah Slater wrote:
Sure, there are some areas, such as hypertext as the engine of application
state, that CouchDB does not use, but looking back at Roy's original doctoral
thesis, REST seems to be predominantly about architecture constraints, of
which this was not one of them. CouchDB embraces all of the mentioned
constraints in some way or another; namely client/server, statelessness,
cacheability, uniform interfaces, and layered systems.
constraint. However, the weight placed on it within his thesis does not seem to
be as great as some of the other core constraints.
-
Chris Anderson at Nov 14, 2008 at 5:41 pm ⇧
Ah, shaving the yak shed.On Fri, Nov 14, 2008 at 4:22 AM, Antony Blakey wrote:
I don't think Couch is truly REST. Certainly _bulk_docs isn't. The fact that
there are URI patterns means it's not REST, at least not if I've understood
Roy's recent communications/frustrations, such as
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.
In particular, point 4 seems to disqualify any system, (including Couch)
that needs the documents in the "Reference" section of the Wiki.
To be REST it has to be just like the web. Using links discovered from
documents, never constructing them according to some scheme.
But I've been thinking about this as well. If we were to attack this
problem of "High REST" head-on, I think the appropriate course would
be to define a media type application/couch+json or something. The
media type's definition would explain how to get from "id" params to
document URIs, etc. Doing that is all it would take to be (mostly)
RESTful. I think the existence of _bulk_docs POST doesn't break
RESTfulness, either. There's no law that says a system can't define
RESTful resources alongside RPC endpoints.
I'm not sure how meditating on the Zen of REST will help us get
json-diffs right, but it sure can't hurt.
Chris
-
Antony Blakey at Nov 14, 2008 at 9:57 pm ⇧
I think some equivalent to base-uri would be needed to avoid theOn 15/11/2008, at 4:10 AM, Chris Anderson wrote:
But I've been thinking about this as well. If we were to attack this
problem of "High REST" head-on, I think the appropriate course would
be to define a media type application/couch+json or something. The
media type's definition would explain how to get from "id" params to
document URIs, etc.
definition of the media type including requirements on the URL
structure of the server. Either that of the document would include the
resource URL.
A landing page with URLs for the design documents would also be
needed. View definitions would need a unique media type because
currently their meaning is dependent on their location. But maybe I'm
misunderstanding REST. So easy.Doing that is all it would take to be (mostly)Agreed, but IMO Couch shouldn't claim to be at all RESTful if it
RESTful. I think the existence of _bulk_docs POST doesn't break
RESTfulness, either. There's no law that says a system can't define
RESTful resources alongside RPC endpoints.
doesn't meet the criteria. It might be REST-like. If some parts are
RESTful and others not, then the claim should be that it includes some
RESTful interfaces.
It might seem nitpicking, but the definition of REST is voided when
things claim to be RESTful that in fact aren't, and it's rarely used
correctly. I'm not even sure what conformance looks like.I'm not sure how meditating on the Zen of REST will help us getSorry, I distracted the discussion when I mentioned _bulk_docs.
json-diffs right, but it sure can't hurt.
I would start working on an implementation of the apply-end of my diff
proposal, but I don't want to waste time if the powers-that-be don't
think it's the right way to go.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Some defeats are instalments to victory.
-- Jacob Riis
-
Antony Blakey at Nov 14, 2008 at 10:31 pm ⇧
Thinking about this, it would not only be RESTful to have the serverOn 15/11/2008, at 8:26 AM, Antony Blakey wrote:
A landing page with URLs for the design documents would also be
needed. View definitions would need a unique media type because
currently their meaning is dependent on their location. But maybe
I'm misunderstanding REST. So easy.
root page contain links such as the _bulk_docs URL and a _design/
index page, it would also make good documentation if it was an HTML
page. The name of the anchor or the rel attribute could serve to
indicate link functions.
<a href='_bulk_docs' rel='bulkDocumentsRPC'>Bulk Document
Operations</a>
<a href='_design/'>Design Document Index</a>
I'm guessing that it would be wrong to annotate the _design link with
a rel because a document isn't a design document by virtue of it's
URL, and the _design/ URL is really just a view. This suggests to me
that maybe _design/ shouldn't be hard-coded, but should be just
another view defined using the existing mechanism e.g. _view/_design.
This touches on the recent discussion about design docs being passed
to views.
IMO the reference docs on the Wiki really belong with the code, and an
obvious feature would be to serve those documents from the server.
I wonder if this idea conforms to this requirement:
"A REST API should spend almost all of its descriptive effort in
defining the media type(s) used for representing resources and driving
application state, or in defining extended relation names and/or
hypertext-enabled mark-up for existing standard media types."
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Borrow money from pessimists - they don't expect it back.
-- Steven Wright
-
Noah Slater at Nov 15, 2008 at 2:18 am ⇧
-
Noah Slater at Nov 19, 2008 at 8:36 am ⇧
Hey,
Just a quick update on this thread. I came across this today:
http://www.sitepen.com/blog/2008/11/18/when-to-use-persevere-a-comparison-with-couchdb-and-others/
Unfortunately, depressingly inaccurate, but it did lead me to:
http://persevere.sitepen.com/#about
And in turn:
http://www.sitepen.com/blog/2008/07/16/jsonquery-data-querying-beyond-jsonpath/
http://goessner.net/articles/JsonPath/
http://www.json.com/2007/10/19/json-referencing-proposal-and-library/
Food for thought.
Best,
-
Damien Katz at Nov 13, 2008 at 5:01 pm ⇧
I was planning on something similar this for field and attachment
level replication, where only the fields or attachments that are
changed are replicated. With the scheme I'm thinking of, it's possible
to have it incremental at any nested level of the doc tree, but I'm
not sure the extra overhead is worth doing it beyond the root fields.
However, Michael's concern of the document getting larger and the app
getting slower still applies, the document must still be loaded into
memory on the server and the diffs applied, and the complete doc will
need to be loaded into memory for view indexing too. Michael,
regardless of the diff updates, I'm thinking you need to break you
document up into multiple documents.
-DamienOn Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which are
critical to diffing correctly.On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater wrote:On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:I for one am interested to hear JSON diff proposals. I think this
Will this cause bandwidth issues when updating large documents if
only a
single field changes. I am afraid that as my documents grow larger
my app gets
slower.
would
make a
great addition to CouchDB. As best I can tell, this should really
be done
as an
external standardisation effort so the whole community could
benifit. I
don't
think using JavaScript to set the document attributes is a very good
solution to
this. An entirely new Media Type is needed, IMHO.
--
Noah Slater, http://tumbolia.org/nslater -
Michael Ramirez at Nov 13, 2008 at 5:16 pm ⇧
If I begin breaking up my documents into related documents aren't I just creating a relational database?
Michael
----- Original Message ----
From: Damien Katz <damien@apache.org>
To: couchdb-user@incubator.apache.org
Sent: Thursday, November 13, 2008 10:00:44 AM
Subject: Re: Document Updates
I was planning on something similar this for field and attachment level replication, where only the fields or attachments that are changed are replicated. With the scheme I'm thinking of, it's possible to have it incremental at any nested level of the doc tree, but I'm not sure the extra overhead is worth doing it beyond the root fields.
However, Michael's concern of the document getting larger and the app getting slower still applies, the document must still be loaded into memory on the server and the diffs applied, and the complete doc will need to be loaded into memory for view indexing too. Michael, regardless of the diff updates, I'm thinking you need to break you document up into multiple documents.
-DamienOn Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which are
critical to diffing correctly.On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater wrote:On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:I for one am interested to hear JSON diff proposals. I think this would
Will this cause bandwidth issues when updating large documents if only a
single field changes. I am afraid that as my documents grow larger my app gets
slower.
make a
great addition to CouchDB. As best I can tell, this should really be done
as an
external standardisation effort so the whole community could benifit. I
don't
think using JavaScript to set the document attributes is a very good
solution to
this. An entirely new Media Type is needed, IMHO.
--
Noah Slater, http://tumbolia.org/nslater -
Noah Slater at Nov 13, 2008 at 5:24 pm ⇧
Well, I don't think Damien was dismissing differential updates.On Thu, Nov 13, 2008 at 09:14:47AM -0800, Michael Ramirez wrote:
If I begin breaking up my documents into related documents aren't I just
creating a relational database?
I do disagree with Damien on his points about root level changes. I think a
generic JSON diff format would be hugely advantageous. Again, this is the kind
of thing that would need to be standardised and baked into JSON client libraries
before it could be used properly though.
I think Damien was pointing out that no matter if you have differential updates,
the size of the documents still effects performance; disk IO, memory and view
calculation all suffer. So, there is a balance to strike between convenience and
performance. It is entirely up to you how that should be addressed per app.
-
Ara.t.howard at Nov 13, 2008 at 5:25 pm ⇧
this is where i find myself too: all the modeling issues with couchOn Nov 13, 2008, at 10:14 AM, Michael Ramirez wrote:
If I begin breaking up my documents into related documents aren't I
just creating a relational database?
seem illicit this suggestion. the issue is that it's quite difficult
to manipulate multiple docs without facilities like 'select for
update' and 'begin transaction'. so far the only approach i've come
up with, once docs are split out, is to read them all, perform the
update, and the write them all back. otherwise any computed value
risks being based on stale data.
it really does seem strange to me that so many solutions to couch
involve re-creating relational constructs - like there must be a
better way....
a @ http://codeforpeople.com/--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama -
Damien Katz at Nov 13, 2008 at 5:32 pm ⇧
Not necessarily relational, it depends on the use case and how much
you can denormalize. But if the document keeps growing, is it really a
document, or bunch of documents bound together?
While some XML databases allow documents that are gigabytes or even
terabytes in size, CouchDB documents are meant to be individually held
in-memory. And while both operate on documents, the query and access
models differ greatly. It might be that XML or relational database is
a better fit for your app.
-DamienOn Nov 13, 2008, at 12:14 PM, Michael Ramirez wrote:
If I begin breaking up my documents into related documents aren't I
just creating a relational database?
Michael
----- Original Message ----
From: Damien Katz <damien@apache.org>
To: couchdb-user@incubator.apache.org
Sent: Thursday, November 13, 2008 10:00:44 AM
Subject: Re: Document Updates
I was planning on something similar this for field and attachment
level replication, where only the fields or attachments that are
changed are replicated. With the scheme I'm thinking of, it's
possible to have it incremental at any nested level of the doc tree,
but I'm not sure the extra overhead is worth doing it beyond the
root fields.
However, Michael's concern of the document getting larger and the
app getting slower still applies, the document must still be loaded
into memory on the server and the diffs applied, and the complete
doc will need to be loaded into memory for view indexing too.
Michael, regardless of the diff updates, I'm thinking you need to
break you document up into multiple documents.
-DamienOn Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:
I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff
The reason for (a) is that you need this to get line breaks, which
are
critical to diffing correctly.
On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater <nslater@apache.org>
wrote:On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:I for one am interested to hear JSON diff proposals. I think this
Will this cause bandwidth issues when updating large documents if
only a
single field changes. I am afraid that as my documents grow
larger my app gets
slower.
would
make a
great addition to CouchDB. As best I can tell, this should really
be done
as an
external standardisation effort so the whole community could
benifit. I
don't
think using JavaScript to set the document attributes is a very good
solution to
this. An entirely new Media Type is needed, IMHO.
--
Noah Slater, http://tumbolia.org/nslater -
Antony Blakey at Nov 13, 2008 at 9:39 pm ⇧
Are you planning on sending attachment deltas e.g. rsync/unison? ThatOn 14/11/2008, at 3:30 AM, Damien Katz wrote:
I was planning on something similar this for field and attachment
level replication, where only the fields or attachments that are
changed are replicated.
would be enormously useful for my CouchDB app.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –
Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –
-– Emily Dickinson 913 (1865)
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | user
|
| categories | couchdb |
| posted | Nov 13, '08 at 4:21p |
| active | Nov 19, '08 at 8:36a |
| posts | 51 |
| users | 9 |
| website | couchdb.apache.org |
| irc | #couchdb |
