FAQ

[CouchDB-user] Document Updates

Michael Ramirez
Nov 13, 2008 at 4:21 pm
When updating documents must the entire document be resent or just the changed fields?

Michael
reply

Search Discussions

50 responses

  • Paul Davis at Nov 13, 2008 at 4:22 pm
    The entire document.

    On Thu, Nov 13, 2008 at 11:20 AM, Michael Ramirez
    wrote:
    When updating documents must the entire document be resent or just the changed fields?

    Michael



  • Michael Ramirez at Nov 13, 2008 at 4:30 pm
    Will this cause bandwidth issues when updating large documents if only a single field changes. I am afraid that as my documents grow larger my app gets slower.


    Michael

    ----- Original Message ----
    From: Paul Davis <paul.joseph.davis@gmail.com>
    To: couchdb-user@incubator.apache.org
    Sent: Thursday, November 13, 2008 9:22:11 AM
    Subject: Re: Document Updates

    The entire document.

    On Thu, Nov 13, 2008 at 11:20 AM, Michael Ramirez
    wrote:
    When updating documents must the entire document be resent or just the changed fields?

    Michael



  • Noah Slater at Nov 13, 2008 at 4:38 pm

    On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
    Will this cause bandwidth issues when updating large documents if only a
    single field changes. I am afraid that as my documents grow larger my app gets
    slower.
    I for one am interested to hear JSON diff proposals. I think this would make a
    great addition to CouchDB. As best I can tell, this should really be done as an
    external standardisation effort so the whole community could benifit. I don't
    think using JavaScript to set the document attributes is a very good solution to
    this. An entirely new Media Type is needed, IMHO.
  • Ayende Rahien at Nov 13, 2008 at 4:41 pm
    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which are
    critical to diffing correctly.
    On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
    Will this cause bandwidth issues when updating large documents if only a
    single field changes. I am afraid that as my documents grow larger my app gets
    slower.
    I for one am interested to hear JSON diff proposals. I think this would
    make a
    great addition to CouchDB. As best I can tell, this should really be done
    as an
    external standardisation effort so the whole community could benifit. I
    don't
    think using JavaScript to set the document attributes is a very good
    solution to
    this. An entirely new Media Type is needed, IMHO.

    --
    Noah Slater, http://tumbolia.org/nslater
  • Noah Slater at Nov 13, 2008 at 4:44 pm

    On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which are
    critical to diffing correctly.
    It's a bit more complex than that, canonicalised JSON is still in it's infancy,
    so we would have to get the community to adopt that first. I know that people
    have been discussing JSON diffs before, may be worth looking up what's already
    been done on this.
  • Paul Davis at Nov 13, 2008 at 7:07 pm

    On Thu, Nov 13, 2008 at 11:43 AM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which are
    critical to diffing correctly.
    It's a bit more complex than that, canonicalised JSON is still in it's infancy,
    so we would have to get the community to adopt that first. I know that people
    have been discussing JSON diffs before, may be worth looking up what's already
    been done on this.

    --
    Noah Slater, http://tumbolia.org/nslater
    I think the JSON diff is a great idea. Unfortunately, the RFC is a bit
    worrisome in one respect:

    Section 2.2:
    'The names within an object SHOULD be unique.'

    I think that this could be a pretty big stumbling block if different
    parsers start taking different interpretations of that and I've alread
    seen implementations do slightly different things with repeated field
    names.

    Contemplating the canonical spec I think I would prefer updating the
    JSON spec's use of SHOULD to MUST. The canonical thing to me seems
    more like a normalization method as opposed to a hard and fast spec.
    Specifically I could see lots of wasted cycles spent on keeping
    canonization when it's needed relatively infrequently.

    Assuming the change to MUST, I could probably write and implement a
    first draft of the spec in a day.

    How hard can it be to change an RFC?

    Paul
  • Noah Slater at Nov 13, 2008 at 9:04 pm

    On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:
    How hard can it be to change an RFC?
    I hope that's humour! :p
  • Paul Davis at Nov 13, 2008 at 9:14 pm
    On Thu, Nov 13, 2008 at 4:04 PM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:
    How hard can it be to change an RFC?
    I hope that's humour! :p
    It's only *one* word! :D
    --
    Noah Slater, http://tumbolia.org/nslater
  • Noah Slater at Nov 13, 2008 at 9:32 pm

    On Thu, Nov 13, 2008 at 04:13:44PM -0500, Paul Davis wrote:
    On Thu, Nov 13, 2008 at 4:04 PM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:
    How hard can it be to change an RFC?
    I hope that's humour! :p
    It's only *one* word! :D
    Heh. Well, I've been waiting for about three months for a simple update to the
    Atom Relationships record with IANA. I doubt the EETF is any faster. In
    addition, I should imagine it would have to be a revised standard, and these
    things take ages. If you wanna do it, and get your name on the RFC, well...

    I still think it's best to see what other effort has been done in this area.
  • Antony Blakey at Nov 13, 2008 at 9:50 pm

    On 14/11/2008, at 3:13 AM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which
    are
    critical to diffing correctly.
    It's a bit more complex than that, canonicalised JSON is still in
    it's infancy,
    so we would have to get the community to adopt that first. I know
    that people
    have been discussing JSON diffs before, may be worth looking up
    what's already
    been done on this.
    Given the XML/JSON isomorphism, I wonder if something like this: http://www.springerlink.com/content/r1t6h8631868k615/
    would be a good start for computing a diff.

    The relevent section from XQuery Update, http://www.w3.org/TR/xquery-update-10/#id-update-primitives
    , might be useful starting point for defining a JSON-encoded
    (recursive) EDL-based structural diff.

    IME a structural diff is better for these purposes than a traditional
    text-diff over a canonicalized format. It's certainly easier to
    generate for clients.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    On the other side, you have the customer and/or user, and they tend to
    do what we call "automating the pain." They say, "What is it we're
    doing now? How would that look if we automated it?" Whereas, what the
    design process should properly be is one of saying, "What are the
    goals we're trying to accomplish and how can we get rid of all this
    task crap?"
    -- Alan Cooper
  • Noah Slater at Nov 13, 2008 at 10:03 pm

    On Fri, Nov 14, 2008 at 08:19:22AM +1030, Antony Blakey wrote:
    The relevent section from XQuery Update,
    http://www.w3.org/TR/xquery-update-10/#id-update-primitives, might be useful
    starting point for defining a JSON-encoded (recursive) EDL-based structural
    diff.
    I think an XQuery/XPath type solution would be very interesting.
  • Antony Blakey at Nov 13, 2008 at 10:39 pm

    On 14/11/2008, at 8:33 AM, Noah Slater wrote:
    On Fri, Nov 14, 2008 at 08:19:22AM +1030, Antony Blakey wrote:
    The relevent section from XQuery Update,
    http://www.w3.org/TR/xquery-update-10/#id-update-primitives, might
    be useful
    starting point for defining a JSON-encoded (recursive) EDL-based
    structural
    diff.
    I think an XQuery/XPath type solution would be very interesting.
    IMO the simplest thing that would work (ignoring representation) looks
    something like this:

    insert <json> in <jsonpath>
    insert <json> after <jsonpath>
    insert <json> before <jsonpath>
    delete <jsonpath>
    replace <jsonpath> with <json>

    where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
    without the executable expressions.

    Diff computation would undoubtedly generate a restricted subset of
    jsonpath selectors, but it's worth supporting the wildcard/recursive
    descent operations for clients.

    Representing the update document as json itself would be clean, so an
    EDL could look like this:

    [
    { "replace":"$.post.comments[2].email"
    "with":"antony@linkuistics.com" },
    { "insert": { "email":.... } "in": "$.post.comments" }
    { "insert": { "email":.... } "after": "$.post.comments[5]" }

    ]

    or, using a meta-encoding (which IMO is unneccessary)

    [
    { "op":"replace" "path":"$.post.comments[2].email" "content":"antony@linkuistics.com
    " },
    { "op":"insert-in" "path":"$.post.comments" "content":
    { "email":.... } }
    { "op":"insert-after" "path":"$.post.comments[5]" "content":
    { "email":.... } }

    ]

    I propose that these aren't declarative, but procedural, in the sense
    that they are applied linearly and hence each path context is the
    result of the proceeding edits, rather than the original tree. This
    complicates the encoding of diffs but results in a much simpler apply
    mechanism. But maybe it would be worth using a declarative form with a
    constant context - I'm unsure about the tradeoffs.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    He who would make his own liberty secure, must guard even his enemy
    from repression.
    -- Thomas Paine
  • Noah Slater at Nov 13, 2008 at 11:41 pm

    On Fri, Nov 14, 2008 at 09:09:03AM +1030, Antony Blakey wrote:
    IMO the simplest thing that would work (ignoring representation) looks
    something like this:

    insert <json> in <jsonpath>
    insert <json> after <jsonpath>
    insert <json> before <jsonpath>
    delete <jsonpath>
    replace <jsonpath> with <json>

    where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
    without the executable expressions.
    Hmm, this seems pretty cool.

    I did some digging to see what else is out there:

    * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
    * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
    * http://www.snellspace.com/wp/?p=895
    * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
    * http://www.snellspace.com/wp/?p=902

    I think I agree that if this was to hit CouchDB it should be done via the PATCH
    method, makes the most sense given the context.
  • Adam Kocoloski at Nov 14, 2008 at 12:03 am
    Partial updates are also a very popular discussion topic on the
    restful-json Google group:

    http://groups.google.com/group/restful-json

    But AFAIK no consensus has been reached yet. Best, Adam
    On Nov 13, 2008, at 6:37 PM, Noah Slater wrote:
    On Fri, Nov 14, 2008 at 09:09:03AM +1030, Antony Blakey wrote:
    IMO the simplest thing that would work (ignoring representation)
    looks
    something like this:

    insert <json> in <jsonpath>
    insert <json> after <jsonpath>
    insert <json> before <jsonpath>
    delete <jsonpath>
    replace <jsonpath> with <json>

    where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
    without the executable expressions.
    Hmm, this seems pretty cool.

    I did some digging to see what else is out there:

    * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
    * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-
    updates/
    * http://www.snellspace.com/wp/?p=895
    * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/
    0316.html
    * http://www.snellspace.com/wp/?p=902

    I think I agree that if this was to hit CouchDB it should be done
    via the PATCH
    method, makes the most sense given the context.

    --
    Noah Slater, http://tumbolia.org/nslater
  • Chris Anderson at Nov 14, 2008 at 12:10 am

    Forgive me for throwing out a loose-cannon idea, but would it be
    easiest to provide an API where the user sends a Javascript function
    to CouchDB via the PATCH method? The function could look something
    like:

    function(doc) {
    doc.my_field = "new value";
    doc.existing_array[3] = "another new value";
    doc.new_array = ["a", "b", 3];
    return doc;
    }

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Antony Blakey at Nov 14, 2008 at 12:32 am

    On 14/11/2008, at 10:39 AM, Chris Anderson wrote:
    Forgive me for throwing out a loose-cannon idea, but would it be
    easiest to provide an API where the user sends a Javascript function
    to CouchDB via the PATCH method? The function could look something
    like:

    function(doc) {
    doc.my_field = "new value";
    doc.existing_array[3] = "another new value";
    doc.new_array = ["a", "b", 3];
    return doc;
    }
    I thought that javascript wasn't part of the Couch core? JSON isn't
    javascript, and all uses of javascript *could* be replaced with e.g.
    Ruby (or my interest, Smalltalk), which is why there is a "language"
    attribute on the views.

    Your proposal would change that.

    Antony Blakey
    --------------------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    75% of statistics are made up on the spot.
  • Antony Blakey at Nov 14, 2008 at 12:49 am
    On 14/11/2008, at 11:01 AM, Antony Blakey wrote:
    On 14/11/2008, at 10:39 AM, Chris Anderson wrote:

    On Thu, Nov 13, 2008 at 3:37 PM, Noah Slater <nslater@apache.org>
    wrote:
    Forgive me for throwing out a loose-cannon idea, but would it be
    easiest to provide an API where the user sends a Javascript function
    to CouchDB via the PATCH method? The function could look something
    like:

    function(doc) {
    doc.my_field = "new value";
    doc.existing_array[3] = "another new value";
    doc.new_array = ["a", "b", 3];
    return doc;
    }
    I thought that javascript wasn't part of the Couch core? JSON isn't
    javascript, and all uses of javascript *could* be replaced with e.g.
    Ruby (or my interest, Smalltalk), which is why there is a "language"
    attribute on the views.

    Your proposal would change that.
    You could use the view mechanism, and attach a "language" attribute,
    and have this be a general transformation interface, which would
    indeed be very nice. For efficiency you would want to apply this over
    sets of documents, and probably in a transactional context like bulk
    update does now.

    However... Damien wants something to use in replication, which would
    mean that javascript would then become a required, rather than an
    optional part of Couch, because replication would require it (unless
    you made the replication diff generator pluggable ... but why go
    there?). The benefit of the declarative diff format is that applying a
    diff can be done within Couch.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    It's amazing that the one side of the conversation that survived is "I
    don't know art, but I know what I like". The reply from the artist was
    "Madam, so does a cow".
    -- Carl Kirkendall
  • Ara.t.howard at Nov 14, 2008 at 1:02 am

    On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

    You could use the view mechanism, and attach a "language" attribute,
    and have this be a general transformation interface, which would
    indeed be very nice. For efficiency you would want to apply this
    over sets of documents, and probably in a transactional context like
    bulk update does now.

    However... Damien wants something to use in replication, which would
    mean that javascript would then become a required, rather than an
    optional part of Couch, because replication would require it (unless
    you made the replication diff generator pluggable ... but why go
    there?). The benefit of the declarative diff format is that applying
    a diff can be done within Couch.
    couldn't these queries run in the view server? in fact any mechanism
    which would allow the view server could accomplish this with a
    protocol between it and the db server. basically it's an addition to
    the map/reduce functionality which would alter documents on the fly.

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
  • Chris Anderson at Nov 14, 2008 at 1:22 am

    On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:
    On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

    You could use the view mechanism, and attach a "language" attribute, and
    have this be a general transformation interface, which would indeed be very
    nice. For efficiency you would want to apply this over sets of documents,
    and probably in a transactional context like bulk update does now.

    However... Damien wants something to use in replication, which would mean
    that javascript would then become a required, rather than an optional part
    of Couch, because replication would require it (unless you made the
    replication diff generator pluggable ... but why go there?). The benefit of
    the declarative diff format is that applying a diff can be done within
    Couch.
    couldn't these queries run in the view server? in fact any mechanism which
    would allow the view server could accomplish this with a protocol between it
    and the db server. basically it's an addition to the map/reduce
    functionality which would alter documents on the fly.
    Antony's right the currently replication does not depend on the
    availability of the view server. And I think it is smart to avoid that
    dependence, when possible.

    Alas, my attempt to bypass all the craziness that is canonical JSON,
    has come short of that. Oh wells...

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Antony Blakey at Nov 14, 2008 at 1:41 am

    On 14/11/2008, at 11:50 AM, Chris Anderson wrote:

    Alas, my attempt to bypass all the craziness that is canonical JSON,
    has come short of that. Oh wells...
    My proposal doesn't require canonical JSON because it is structural
    rather than textual. That's one reason I think it's a good approach.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    One should respect public opinion insofar as is necessary to avoid
    starvation and keep out of prison, but anything that goes beyond this
    is voluntary submission to an unnecessary tyranny.
    -- Bertrand Russell
  • Paul Davis at Nov 14, 2008 at 2:34 am
    I don't think we need canonical JSON.

    The Spec definitely needs to be disambiguated though. As I see it
    there are two interpretations:

    1. Order of fields matters which means repeated fields are ok
    2. Order does not matter which means repeated fields are NOT ok

    It doesn't matter which is chosen, but one of them must be to make this work.

    Also, I got bored. So I implemented JSON diff in python for Case #2.

    http://www.davispj.com/svn/projects/json-diff/json-diff.py

    I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
    library and then pound the diff thing with it.

    Not sure if it's obvious or not, but switching from case 2 to 1 is
    straightforward. Also, my current array diff implementation is kinda
    whack. And indels screw the rest of the diff, as in, its not so much a
    diff as a delete rest and add new. Getting this optimal is actually an
    N^2 runtime algorithm via dynamic programming (smith-waterman style)

    Also, do note that the erlang parser and python (and i assume ruby is
    in the python boat) have different behaviors in respect to the 2
    cases. Erlang is Case 1, python is case 2.

    Paul

    On Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson wrote:
    On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:
    On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

    You could use the view mechanism, and attach a "language" attribute, and
    have this be a general transformation interface, which would indeed be very
    nice. For efficiency you would want to apply this over sets of documents,
    and probably in a transactional context like bulk update does now.

    However... Damien wants something to use in replication, which would mean
    that javascript would then become a required, rather than an optional part
    of Couch, because replication would require it (unless you made the
    replication diff generator pluggable ... but why go there?). The benefit of
    the declarative diff format is that applying a diff can be done within
    Couch.
    couldn't these queries run in the view server? in fact any mechanism which
    would allow the view server could accomplish this with a protocol between it
    and the db server. basically it's an addition to the map/reduce
    functionality which would alter documents on the fly.
    Antony's right the currently replication does not depend on the
    availability of the view server. And I think it is smart to avoid that
    dependence, when possible.

    Alas, my attempt to bypass all the craziness that is canonical JSON,
    has come short of that. Oh wells...

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Antony Blakey at Nov 14, 2008 at 2:53 am

    On 14/11/2008, at 1:04 PM, Paul Davis wrote:

    I don't think we need canonical JSON.

    The Spec definitely needs to be disambiguated though. As I see it
    there are two interpretations:

    1. Order of fields matters which means repeated fields are ok
    2. Order does not matter which means repeated fields are NOT ok
    Given that JSON is executable Javascript, 2 is the only interpretation
    that allows for roundtrip equivalence.
    Not sure if it's obvious or not, but switching from case 2 to 1 is
    straightforward. Also, my current array diff implementation is kinda
    whack. And indels screw the rest of the diff, as in, its not so much a
    diff as a delete rest and add new. Getting this optimal is actually an
    N^2 runtime algorithm via dynamic programming (smith-waterman style)
    The algorithm described here: http://www.springerlink.com/content/r1t6h8631868k615/
    is O(n), and although it isn't optimal, I'm guessing the performance
    stability makes up for it.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    Did you hear about the Buddhist who refused Novocain during a root
    canal?
    His goal: transcend dental medication.
  • Noah Slater at Nov 14, 2008 at 4:06 am

    On Fri, Nov 14, 2008 at 01:22:58PM +1030, Antony Blakey wrote:
    1. Order of fields matters which means repeated fields are ok
    2. Order does not matter which means repeated fields are NOT ok
    Given that JSON is executable Javascript, 2 is the only interpretation
    that allows for roundtrip equivalence.
    I fear this is rather a large jump to conclusion.
  • Antony Blakey at Nov 14, 2008 at 7:02 am

    On 14/11/2008, at 2:36 PM, Noah Slater wrote:
    On Fri, Nov 14, 2008 at 01:22:58PM +1030, Antony Blakey wrote:
    1. Order of fields matters which means repeated fields are ok
    2. Order does not matter which means repeated fields are NOT ok
    Given that JSON is executable Javascript, 2 is the only
    interpretation
    that allows for roundtrip equivalence.
    I fear this is rather a large jump to conclusion.
    I'm only claiming that *roundtrip equivalence* is only possible if you
    don't allow duplicate keys, because JSON being a serialization format
    for (limited) Javascript data structures, cannot be generated with
    duplicate keys from those data structures.

    Being executable, it's interpretation is defined by the Javascript
    spec. JSON is a serialization of a (limited) Javascript data
    structure. Javascript hashes don't allow for duplicate keys, nor do
    they (AFAIR) provide any ordering guarantees. I contend that the
    semantics of JSON follow from this, and as an extension I wonder if
    any JSON that isn't a serialization of some Javascript data structure
    should not be valid JSON. OTOH, the operational interpretation would
    suggest that maybe the text representation can have duplicate keys,
    but that the data structure that it represents (which is what we
    should care about) does not.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    Some defeats are instalments to victory.
    -- Jacob Riis
  • Paul Davis at Nov 15, 2008 at 1:38 am
    I wrote a fuzz thing to go along with the diff testing.

    You can get it with:

    $ sudo easy_install jsontools

    # Examples
    from StringIO import StringIO
    import jsontools
    stream = StringIO()

    //Fuzzy objects
    fj = jsontools.FuzzyJson()
    obj1 = fj.generate(1).next()
    obj2 = fj.modify(obj1)

    //Diff the objects
    jsontools.jsondiff(obj1, obj2, stream=stream)

    //Apply the diff
    stream.seek(0)
    result = jsontools.jsonapply(stream, obj1)

    //Compare them
    assert jsontools.jsoncmp(result, obj2) == 2

    Any comments?

    Paul
    On Thu, Nov 13, 2008 at 9:34 PM, Paul Davis wrote:
    I don't think we need canonical JSON.

    The Spec definitely needs to be disambiguated though. As I see it
    there are two interpretations:

    1. Order of fields matters which means repeated fields are ok
    2. Order does not matter which means repeated fields are NOT ok

    It doesn't matter which is chosen, but one of them must be to make this work.

    Also, I got bored. So I implemented JSON diff in python for Case #2.

    http://www.davispj.com/svn/projects/json-diff/json-diff.py

    I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
    library and then pound the diff thing with it.

    Not sure if it's obvious or not, but switching from case 2 to 1 is
    straightforward. Also, my current array diff implementation is kinda
    whack. And indels screw the rest of the diff, as in, its not so much a
    diff as a delete rest and add new. Getting this optimal is actually an
    N^2 runtime algorithm via dynamic programming (smith-waterman style)

    Also, do note that the erlang parser and python (and i assume ruby is
    in the python boat) have different behaviors in respect to the 2
    cases. Erlang is Case 1, python is case 2.

    Paul

    On Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson wrote:
    On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:
    On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

    You could use the view mechanism, and attach a "language" attribute, and
    have this be a general transformation interface, which would indeed be very
    nice. For efficiency you would want to apply this over sets of documents,
    and probably in a transactional context like bulk update does now.

    However... Damien wants something to use in replication, which would mean
    that javascript would then become a required, rather than an optional part
    of Couch, because replication would require it (unless you made the
    replication diff generator pluggable ... but why go there?). The benefit of
    the declarative diff format is that applying a diff can be done within
    Couch.
    couldn't these queries run in the view server? in fact any mechanism which
    would allow the view server could accomplish this with a protocol between it
    and the db server. basically it's an addition to the map/reduce
    functionality which would alter documents on the fly.
    Antony's right the currently replication does not depend on the
    availability of the view server. And I think it is smart to avoid that
    dependence, when possible.

    Alas, my attempt to bypass all the craziness that is canonical JSON,
    has come short of that. Oh wells...

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Paul Davis at Nov 15, 2008 at 1:42 am
    On Fri, Nov 14, 2008 at 8:37 PM, Paul Davis wrote:
    I wrote a fuzz thing to go along with the diff testing.

    You can get it with:

    $ sudo easy_install jsontools

    # Examples
    from StringIO import StringIO
    import jsontools
    stream = StringIO()

    //Fuzzy objects
    fj = jsontools.FuzzyJson()
    obj1 = fj.generate(1).next()
    obj2 = fj.modify(obj1)

    //Diff the objects
    jsontools.jsondiff(obj1, obj2, stream=stream)

    //Apply the diff
    stream.seek(0)
    result = jsontools.jsonapply(stream, obj1)

    //Compare them
    assert jsontools.jsoncmp(result, obj2) == 2 == True
    Any comments?

    Paul
    On Thu, Nov 13, 2008 at 9:34 PM, Paul Davis wrote:
    I don't think we need canonical JSON.

    The Spec definitely needs to be disambiguated though. As I see it
    there are two interpretations:

    1. Order of fields matters which means repeated fields are ok
    2. Order does not matter which means repeated fields are NOT ok

    It doesn't matter which is chosen, but one of them must be to make this work.

    Also, I got bored. So I implemented JSON diff in python for Case #2.

    http://www.davispj.com/svn/projects/json-diff/json-diff.py

    I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
    library and then pound the diff thing with it.

    Not sure if it's obvious or not, but switching from case 2 to 1 is
    straightforward. Also, my current array diff implementation is kinda
    whack. And indels screw the rest of the diff, as in, its not so much a
    diff as a delete rest and add new. Getting this optimal is actually an
    N^2 runtime algorithm via dynamic programming (smith-waterman style)

    Also, do note that the erlang parser and python (and i assume ruby is
    in the python boat) have different behaviors in respect to the 2
    cases. Erlang is Case 1, python is case 2.

    Paul

    On Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson wrote:
    On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard wrote:
    On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

    You could use the view mechanism, and attach a "language" attribute, and
    have this be a general transformation interface, which would indeed be very
    nice. For efficiency you would want to apply this over sets of documents,
    and probably in a transactional context like bulk update does now.

    However... Damien wants something to use in replication, which would mean
    that javascript would then become a required, rather than an optional part
    of Couch, because replication would require it (unless you made the
    replication diff generator pluggable ... but why go there?). The benefit of
    the declarative diff format is that applying a diff can be done within
    Couch.
    couldn't these queries run in the view server? in fact any mechanism which
    would allow the view server could accomplish this with a protocol between it
    and the db server. basically it's an addition to the map/reduce
    functionality which would alter documents on the fly.
    Antony's right the currently replication does not depend on the
    availability of the view server. And I think it is smart to avoid that
    dependence, when possible.

    Alas, my attempt to bypass all the craziness that is canonical JSON,
    has come short of that. Oh wells...

    --
    Chris Anderson
    http://jchris.mfdz.com
  • Ayende Rahien at Nov 14, 2008 at 4:18 am
    Take into account that the view server is explicitly a separate
    process.Requiring
    it to process incoming request would create a very high overhead.
    On Fri, Nov 14, 2008 at 3:02 AM, ara.t.howard wrote:


    On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

    You could use the view mechanism, and attach a "language" attribute, and
    have this be a general transformation interface, which would indeed be very
    nice. For efficiency you would want to apply this over sets of documents,
    and probably in a transactional context like bulk update does now.

    However... Damien wants something to use in replication, which would mean
    that javascript would then become a required, rather than an optional part
    of Couch, because replication would require it (unless you made the
    replication diff generator pluggable ... but why go there?). The benefit of
    the declarative diff format is that applying a diff can be done within
    Couch.
    couldn't these queries run in the view server? in fact any mechanism which
    would allow the view server could accomplish this with a protocol between it
    and the db server. basically it's an addition to the map/reduce
    functionality which would alter documents on the fly.


    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama


  • Ara.t.howard at Nov 14, 2008 at 4:53 am

    On Nov 13, 2008, at 9:18 PM, Ayende Rahien wrote:

    Take into account that the view server is explicitly a separate
    process.Requiring
    it to process incoming request would create a very high overhead.
    i'm just saying - if people could write javascript to execute on the
    server people would really be singing hallelujah. from my perspective
    it's only about 10000000% as cool as being to plugin a different
    language in as a view server.

    2 cts.


    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
  • Paul Davis at Nov 14, 2008 at 5:07 am
    The JS PATCH function is definitely an interesting idea, but I think
    is not at all going to fix the issue. How many use cases are going to
    be able to reduce an update operation to a function?

    Most use cases that I see are of the nature: "Download JSON,
    deserialize to native language, mutate, serialize, send to server".
    Anyone that wants write an abstract "mutate" -> JS function thing has
    props in my book.

    It is forseable that you could treat it as a stored procedure though.
    As in the function signature becomes "function(doc, input)" and input
    is a complex JSON object that is used in the updated. This almost has
    actual use in terms of the earlier thread on transaction semantics
    that I assume would be implementable. But really the transaction idea
    is a whole new can of worms in terms of couch would have to be allowed
    to request multiple documents in one transaction etc.

    I think related to Noah's original sentiment of getting some diff
    system set up way outside of couch in JSON land that's implementable
    in any language for anything is the best bet.

    Paul
    On Thu, Nov 13, 2008 at 11:53 PM, ara.t.howard wrote:
    On Nov 13, 2008, at 9:18 PM, Ayende Rahien wrote:

    Take into account that the view server is explicitly a separate
    process.Requiring
    it to process incoming request would create a very high overhead.
    i'm just saying - if people could write javascript to execute on the server
    people would really be singing hallelujah. from my perspective it's only
    about 10000000% as cool as being to plugin a different language in as a view
    server.

    2 cts.


    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being better.
    simply reflect on that.
    h.h. the 14th dalai lama


  • Ara.t.howard at Nov 14, 2008 at 12:37 am

    On Nov 13, 2008, at 5:09 PM, Chris Anderson wrote:
    Forgive me for throwing out a loose-cannon idea, but would it be
    easiest to provide an API where the user sends a Javascript function
    to CouchDB via the PATCH method? The function could look something
    like:

    function(doc) {
    doc.my_field = "new value";
    doc.existing_array[3] = "another new value";
    doc.new_array = ["a", "b", 3];
    return doc;
    }

    --
    yeah - i loooooooove this idea!

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
  • Antony Blakey at Nov 14, 2008 at 12:33 am

    On 14/11/2008, at 10:07 AM, Noah Slater wrote:

    I did some digging to see what else is out there:

    * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
    * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-
    updates/
    * http://www.snellspace.com/wp/?p=895
    * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/
    0316.html
    * http://www.snellspace.com/wp/?p=902

    I think I agree that if this was to hit CouchDB it should be done
    via the PATCH
    method, makes the most sense given the context.
    My only concern with PATCH is that it isn't HTTP 1.1. When I was
    working with WebDAV we continually had problems with proxies that
    didn't deal with non-HTTP-1.1 methods.

    IMO the only other alternative is a POST with either a content-type,
    although I feel uneasy about that if the content actually *is* JSON,
    or a query parameter.

    Alternatively you could use a POST to an extended URL, but that
    interferes with attachments. And as I understand it, that would only
    truly qualify as REST if it was included in a document e.g.
    discoverable rather than specified.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    Did you hear about the Buddhist who refused Novocain during a root
    canal?
    His goal: transcend dental medication.
  • Antony Blakey at Nov 14, 2008 at 2:03 am

    On 14/11/2008, at 10:07 AM, Noah Slater wrote:

    I think I agree that if this was to hit CouchDB it should be done
    via the PATCH
    method, makes the most sense given the context.

    You would want to allow partial updates in a bulk operation, so any
    packaging would need to be usable in that context as well. Given
    updates need to be handled separately, maybe deletions should be as
    well.

    {
    "docs": [
    /* Just for backwards compatibility ... but does that matter for
    an alpha product ? */
    ... as now ...
    ],

    "PUT": [
    /* As now with docs, but not allowing "delete":true ? */
    { "_id": ..., "_rev": ..., ... }
    ...
    ],
    "PATCH": [
    { "_id": ..., "_rev": ..., deltas: [ { "replace":... }, ... ] }
    ...
    ],
    "DELETE": [
    { "_id": ..., "_rev": ... },
    ...
    ]
    }

    This has the benefit of (roughly) representing the HTTP methods that
    it aggregates.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    It is no measure of health to be well adjusted to a profoundly sick
    society.
    -- Jiddu Krishnamurti
  • Antony Blakey at Nov 14, 2008 at 2:22 am

    On 14/11/2008, at 12:32 PM, Antony Blakey wrote:

    {
    "docs": [
    /* Just for backwards compatibility ... but does that matter for
    an alpha product ? */
    ... as now ...
    ],

    "PUT": [
    /* As now with docs, but not allowing "delete":true ? */
    { "_id": ..., "_rev": ..., ... }
    ...
    ],
    "PATCH": [
    { "_id": ..., "_rev": ..., deltas: [ { "replace":... }, ... ] }
    ...
    ],
    "DELETE": [
    { "_id": ..., "_rev": ... },
    ...
    ]
    }

    This has the benefit of (roughly) representing the HTTP methods that
    it aggregates.
    On second thought, given that it represents an aggregation of commands
    that have an explicit ordering, maybe it shouldn't be grouped by
    method but instead use the method as a key. Like this:

    [
    { "delete":{ "_id": ..., "_rev": ... } },
    { "put": { "_id": ..., "_rev": ..., ... },
    { "patch": { "_id": ..., "_rev": ... } "with": [ { "replace": ...
    "with": ... }, ... ] },
    ...
    ]

    The benefit of this that generating this is easier to reason about and
    generate if your client code is doing deletes and inserts of documents
    with the same id. It accurately represents adding a transactional
    boundary without requiring a change in semantics.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    When I hear somebody sigh, 'Life is hard,' I am always tempted to ask,
    'Compared to what?'
    -- Sydney Harris
  • Noah Slater at Nov 14, 2008 at 4:03 am

    On Fri, Nov 14, 2008 at 12:32:18PM +1030, Antony Blakey wrote:
    You would want to allow partial updates in a bulk operation, so any
    packaging would need to be usable in that context as well. Given updates
    need to be handled separately, maybe deletions should be as well. ...
    "PUT": [ ...
    "PATCH": [ ...
    "DELETE": [
    We shouldn't be tunneling verbs though media types, this is antithetical to the
    principals of REST and would harm all manner of possible intermediary clients.
  • Antony Blakey at Nov 14, 2008 at 7:05 am

    On 14/11/2008, at 2:32 PM, Noah Slater wrote:
    On Fri, Nov 14, 2008 at 12:32:18PM +1030, Antony Blakey wrote:
    You would want to allow partial updates in a bulk operation, so any
    packaging would need to be usable in that context as well. Given
    updates
    need to be handled separately, maybe deletions should be as well. ...
    "PUT": [ ...
    "PATCH": [ ...
    "DELETE": [
    We shouldn't be tunneling verbs though media types, this is
    antithetical to the
    principals of REST and would harm all manner of possible
    intermediary clients.
    I'm not tunneling verbs, I'm just re-using the names of the methods
    that would normally be used as selectors. I wasn't implying anything
    more than that.

    Couch's bulk operation already has this issue. You delete a document
    using the DELETE verb, yet in a bulk operation you set the "_deleted"
    special attribute. That is in effect tunneling the DELETE, using a
    different representation, within a POST.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    There are two ways of constructing a software design: One way is to
    make it so simple that there are obviously no deficiencies, and the
    other way is to make it so complicated that there are no obvious
    deficiencies.
    -- C. A. R. Hoare
  • Noah Slater at Nov 14, 2008 at 11:57 am

    On Fri, Nov 14, 2008 at 05:35:03PM +1030, Antony Blakey wrote:
    I'm not tunneling verbs, I'm just re-using the names of the methods that would
    normally be used as selectors. I wasn't implying anything more than that. ...
    You delete a document using the DELETE verb, yet in a bulk operation you set
    the "_deleted" special attribute. That is in effect tunneling the DELETE,
    using a different representation, within a POST.
    A RESTful system should work by exchanging representations of resources.

    As best I understand it, if you want to modify a resource in a way that is not a
    direct update, move or delete you should use a separate media type, something
    like application/diff+json if it existed. A JSON diff could include ways to
    delete and update multiple documents at the same time, a bit like UNIX diff is
    able to specify filenames. This could be used for single or bulk updates.

    Of course, this feels very similar to your original proposal, which leaves me a
    little confused. Throwing about JSON with keys such as "POST" and "DELETE" feels
    very RPC-like. Perhaps the difference is the use of a separate media type.

    I am eager to be corrected by any resident RESTafarians. For me, REST is a bit
    like Zen. Sometimes I think I understand it totally, and other times I'm
    convinced that I don't understand it at all.

    Best,
  • Antony Blakey at Nov 14, 2008 at 12:23 pm

    On 14/11/2008, at 10:23 PM, Noah Slater wrote:
    On Fri, Nov 14, 2008 at 05:35:03PM +1030, Antony Blakey wrote:
    I'm not tunneling verbs, I'm just re-using the names of the methods
    that would
    normally be used as selectors. I wasn't implying anything more than
    that. ...
    You delete a document using the DELETE verb, yet in a bulk
    operation you set
    the "_deleted" special attribute. That is in effect tunneling the
    DELETE,
    using a different representation, within a POST.
    A RESTful system should work by exchanging representations of
    resources.

    As best I understand it, if you want to modify a resource in a way
    that is not a
    direct update, move or delete you should use a separate media type,
    something
    like application/diff+json if it existed. A JSON diff could include
    ways to
    delete and update multiple documents at the same time, a bit like
    UNIX diff is
    able to specify filenames. This could be used for single or bulk
    updates.
    Yes, a content-type was something I suggested, but it didn't seem
    right. In a strictly RESTful sense, maybe it does make sense however.
    Of course, this feels very similar to your original proposal, which
    leaves me a
    little confused. Throwing about JSON with keys such as "POST" and
    "DELETE" feels
    very RPC-like. Perhaps the difference is the use of a separate media
    type.
    These items, such as 'post' and 'delete' can be equated to the
    'replace'/'insert' et al of my diff proposal, but operating over
    documents rather than JSON trees. The fact that they are so named was
    *purely* an attempt on my part to make it obvious what equivalent
    (singular) resource operation (identified by HTTP method) was
    equivalent to that document-level operation, and was in no way an
    attempt to tunnel the HTTP mechanism.

    The current way _bulk_docs does deletion doesn't feel right. I do
    think there should be some isomorphism between the _bulk_docs
    structure and the operations one would do without using the _bulk_docs
    mechanism, hence my suggestion (but the second temporal ordering, not
    the first operation-type ordering).
    I am eager to be corrected by any resident RESTafarians. For me,
    REST is a bit
    like Zen. Sometimes I think I understand it totally, and other times
    I'm
    convinced that I don't understand it at all.
    I don't think Couch is truly REST. Certainly _bulk_docs isn't. The
    fact that there are URI patterns means it's not REST, at least not if
    I've understood Roy's recent communications/frustrations, such as http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
    .

    In particular, point 4 seems to disqualify any system, (including
    Couch) that needs the documents in the "Reference" section of the Wiki.

    To be REST it has to be just like the web. Using links discovered from
    documents, never constructing them according to some scheme.

    However, what does it matter? REST certainly is a slippery sucker, but
    that may be because we want it to be more generally applicable than it
    is. Couch doesn't have to be REST, and I suspect that it in fact
    cannot be.

    Antony Blakey
    --------------------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    Man will never be free until the last king is strangled with the
    entrails of the last priest.
    -- Denis Diderot
  • Noah Slater at Nov 14, 2008 at 12:36 pm

    On Fri, Nov 14, 2008 at 10:52:35PM +1030, Antony Blakey wrote:
    I am eager to be corrected by any resident RESTafarians. For me, REST is a
    bit like Zen. Sometimes I think I understand it totally, and other times I'm
    convinced that I don't understand it at all.
    I don't think Couch is truly REST. Certainly _bulk_docs isn't. The fact that
    there are URI patterns means it's not REST, at least not if I've understood
    Roy's recent communications/frustrations, such as
    http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.

    In particular, point 4 seems to disqualify any system, (including Couch) that
    needs the documents in the "Reference" section of the Wiki.

    To be REST it has to be just like the web. Using links discovered from
    documents, never constructing them according to some scheme.

    However, what does it matter? REST certainly is a slippery sucker, but that
    may be because we want it to be more generally applicable than it is. Couch
    doesn't have to be REST, and I suspect that it in fact cannot be.
    Sure, there are some areas, such as hypertext as the engine of application
    state, that CouchDB does not use, but looking back at Roy's original doctoral
    thesis, REST seems to be predominantly about architecture constraints, of which
    this was not one of them. CouchDB embraces all of the mentioned constraints in
    some way or another; namely client/server, statelessness, cacheability, uniform
    interfaces, and layered systems.

    So, I guess RESTful or non-RESTful is a false dichotomy in this respect.

    Additionally, I agree with you on the state of current bulk operations. I think
    there is room for improvement, and hopefully some kind of differential update
    could be possible at the same time.
  • Noah Slater at Nov 14, 2008 at 12:41 pm

    On Fri, Nov 14, 2008 at 12:35:45PM +0000, Noah Slater wrote:
    Sure, there are some areas, such as hypertext as the engine of application
    state, that CouchDB does not use, but looking back at Roy's original doctoral
    thesis, REST seems to be predominantly about architecture constraints, of
    which this was not one of them. CouchDB embraces all of the mentioned
    constraints in some way or another; namely client/server, statelessness,
    cacheability, uniform interfaces, and layered systems.
    Careless wording on my part, hypertext is clearly an architectural
    constraint. However, the weight placed on it within his thesis does not seem to
    be as great as some of the other core constraints.
  • Chris Anderson at Nov 14, 2008 at 5:41 pm

    On Fri, Nov 14, 2008 at 4:22 AM, Antony Blakey wrote:
    I don't think Couch is truly REST. Certainly _bulk_docs isn't. The fact that
    there are URI patterns means it's not REST, at least not if I've understood
    Roy's recent communications/frustrations, such as
    http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.

    In particular, point 4 seems to disqualify any system, (including Couch)
    that needs the documents in the "Reference" section of the Wiki.

    To be REST it has to be just like the web. Using links discovered from
    documents, never constructing them according to some scheme.
    Ah, shaving the yak shed.

    But I've been thinking about this as well. If we were to attack this
    problem of "High REST" head-on, I think the appropriate course would
    be to define a media type application/couch+json or something. The
    media type's definition would explain how to get from "id" params to
    document URIs, etc. Doing that is all it would take to be (mostly)
    RESTful. I think the existence of _bulk_docs POST doesn't break
    RESTfulness, either. There's no law that says a system can't define
    RESTful resources alongside RPC endpoints.

    I'm not sure how meditating on the Zen of REST will help us get
    json-diffs right, but it sure can't hurt.

    Chris





    --
    Chris Anderson
    http://jchris.mfdz.com
  • Antony Blakey at Nov 14, 2008 at 9:57 pm

    On 15/11/2008, at 4:10 AM, Chris Anderson wrote:

    But I've been thinking about this as well. If we were to attack this
    problem of "High REST" head-on, I think the appropriate course would
    be to define a media type application/couch+json or something. The
    media type's definition would explain how to get from "id" params to
    document URIs, etc.
    I think some equivalent to base-uri would be needed to avoid the
    definition of the media type including requirements on the URL
    structure of the server. Either that of the document would include the
    resource URL.

    A landing page with URLs for the design documents would also be
    needed. View definitions would need a unique media type because
    currently their meaning is dependent on their location. But maybe I'm
    misunderstanding REST. So easy.
    Doing that is all it would take to be (mostly)
    RESTful. I think the existence of _bulk_docs POST doesn't break
    RESTfulness, either. There's no law that says a system can't define
    RESTful resources alongside RPC endpoints.
    Agreed, but IMO Couch shouldn't claim to be at all RESTful if it
    doesn't meet the criteria. It might be REST-like. If some parts are
    RESTful and others not, then the claim should be that it includes some
    RESTful interfaces.

    It might seem nitpicking, but the definition of REST is voided when
    things claim to be RESTful that in fact aren't, and it's rarely used
    correctly. I'm not even sure what conformance looks like.
    I'm not sure how meditating on the Zen of REST will help us get
    json-diffs right, but it sure can't hurt.
    Sorry, I distracted the discussion when I mentioned _bulk_docs.

    I would start working on an implementation of the apply-end of my diff
    proposal, but I don't want to waste time if the powers-that-be don't
    think it's the right way to go.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    Some defeats are instalments to victory.
    -- Jacob Riis
  • Antony Blakey at Nov 14, 2008 at 10:31 pm

    On 15/11/2008, at 8:26 AM, Antony Blakey wrote:

    A landing page with URLs for the design documents would also be
    needed. View definitions would need a unique media type because
    currently their meaning is dependent on their location. But maybe
    I'm misunderstanding REST. So easy.
    Thinking about this, it would not only be RESTful to have the server
    root page contain links such as the _bulk_docs URL and a _design/
    index page, it would also make good documentation if it was an HTML
    page. The name of the anchor or the rel attribute could serve to
    indicate link functions.

    <a href='_bulk_docs' rel='bulkDocumentsRPC'>Bulk Document
    Operations</a>
    <a href='_design/'>Design Document Index</a>

    I'm guessing that it would be wrong to annotate the _design link with
    a rel because a document isn't a design document by virtue of it's
    URL, and the _design/ URL is really just a view. This suggests to me
    that maybe _design/ shouldn't be hard-coded, but should be just
    another view defined using the existing mechanism e.g. _view/_design.
    This touches on the recent discussion about design docs being passed
    to views.

    IMO the reference docs on the Wiki really belong with the code, and an
    obvious feature would be to serve those documents from the server.

    I wonder if this idea conforms to this requirement:

    "A REST API should spend almost all of its descriptive effort in
    defining the media type(s) used for representing resources and driving
    application state, or in defining extended relation names and/or
    hypertext-enabled mark-up for existing standard media types."

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    Borrow money from pessimists - they don't expect it back.
    -- Steven Wright
  • Noah Slater at Nov 15, 2008 at 2:18 am
    I think it would be extreemly benificial if we made CouchDB provide self
    descriptive hyperlinks that let clients explore the available URI space. Along
    with a set of properly defined media types, this could go a long way towards
    making CouchDB a truly RESTful database management system.
  • Noah Slater at Nov 19, 2008 at 8:36 am
  • Damien Katz at Nov 13, 2008 at 5:01 pm
    I was planning on something similar this for field and attachment
    level replication, where only the fields or attachments that are
    changed are replicated. With the scheme I'm thinking of, it's possible
    to have it incremental at any nested level of the doc tree, but I'm
    not sure the extra overhead is worth doing it beyond the root fields.

    However, Michael's concern of the document getting larger and the app
    getting slower still applies, the document must still be loaded into
    memory on the server and the diffs applied, and the complete doc will
    need to be loaded into memory for view indexing too. Michael,
    regardless of the diff updates, I'm thinking you need to break you
    document up into multiple documents.

    -Damien
    On Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:

    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which are
    critical to diffing correctly.
    On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
    Will this cause bandwidth issues when updating large documents if
    only a
    single field changes. I am afraid that as my documents grow larger
    my app gets
    slower.
    I for one am interested to hear JSON diff proposals. I think this
    would
    make a
    great addition to CouchDB. As best I can tell, this should really
    be done
    as an
    external standardisation effort so the whole community could
    benifit. I
    don't
    think using JavaScript to set the document attributes is a very good
    solution to
    this. An entirely new Media Type is needed, IMHO.

    --
    Noah Slater, http://tumbolia.org/nslater
  • Michael Ramirez at Nov 13, 2008 at 5:16 pm
    If I begin breaking up my documents into related documents aren't I just creating a relational database?


    Michael

    ----- Original Message ----
    From: Damien Katz <damien@apache.org>
    To: couchdb-user@incubator.apache.org
    Sent: Thursday, November 13, 2008 10:00:44 AM
    Subject: Re: Document Updates

    I was planning on something similar this for field and attachment level replication, where only the fields or attachments that are changed are replicated. With the scheme I'm thinking of, it's possible to have it incremental at any nested level of the doc tree, but I'm not sure the extra overhead is worth doing it beyond the root fields.

    However, Michael's concern of the document getting larger and the app getting slower still applies, the document must still be loaded into memory on the server and the diffs applied, and the complete doc will need to be loaded into memory for view indexing too. Michael, regardless of the diff updates, I'm thinking you need to break you document up into multiple documents.

    -Damien
    On Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:

    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which are
    critical to diffing correctly.
    On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater wrote:
    On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
    Will this cause bandwidth issues when updating large documents if only a
    single field changes. I am afraid that as my documents grow larger my app gets
    slower.
    I for one am interested to hear JSON diff proposals. I think this would
    make a
    great addition to CouchDB. As best I can tell, this should really be done
    as an
    external standardisation effort so the whole community could benifit. I
    don't
    think using JavaScript to set the document attributes is a very good
    solution to
    this. An entirely new Media Type is needed, IMHO.

    --
    Noah Slater, http://tumbolia.org/nslater
  • Noah Slater at Nov 13, 2008 at 5:24 pm

    On Thu, Nov 13, 2008 at 09:14:47AM -0800, Michael Ramirez wrote:
    If I begin breaking up my documents into related documents aren't I just
    creating a relational database?
    Well, I don't think Damien was dismissing differential updates.

    I do disagree with Damien on his points about root level changes. I think a
    generic JSON diff format would be hugely advantageous. Again, this is the kind
    of thing that would need to be standardised and baked into JSON client libraries
    before it could be used properly though.

    I think Damien was pointing out that no matter if you have differential updates,
    the size of the documents still effects performance; disk IO, memory and view
    calculation all suffer. So, there is a balance to strike between convenience and
    performance. It is entirely up to you how that should be addressed per app.
  • Ara.t.howard at Nov 13, 2008 at 5:25 pm

    On Nov 13, 2008, at 10:14 AM, Michael Ramirez wrote:

    If I begin breaking up my documents into related documents aren't I
    just creating a relational database?
    this is where i find myself too: all the modeling issues with couch
    seem illicit this suggestion. the issue is that it's quite difficult
    to manipulate multiple docs without facilities like 'select for
    update' and 'begin transaction'. so far the only approach i've come
    up with, once docs are split out, is to read them all, perform the
    update, and the write them all back. otherwise any computed value
    risks being based on stale data.

    it really does seem strange to me that so many solutions to couch
    involve re-creating relational constructs - like there must be a
    better way....

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
  • Damien Katz at Nov 13, 2008 at 5:32 pm
    Not necessarily relational, it depends on the use case and how much
    you can denormalize. But if the document keeps growing, is it really a
    document, or bunch of documents bound together?

    While some XML databases allow documents that are gigabytes or even
    terabytes in size, CouchDB documents are meant to be individually held
    in-memory. And while both operate on documents, the query and access
    models differ greatly. It might be that XML or relational database is
    a better fit for your app.

    -Damien

    On Nov 13, 2008, at 12:14 PM, Michael Ramirez wrote:

    If I begin breaking up my documents into related documents aren't I
    just creating a relational database?


    Michael

    ----- Original Message ----
    From: Damien Katz <damien@apache.org>
    To: couchdb-user@incubator.apache.org
    Sent: Thursday, November 13, 2008 10:00:44 AM
    Subject: Re: Document Updates

    I was planning on something similar this for field and attachment
    level replication, where only the fields or attachments that are
    changed are replicated. With the scheme I'm thinking of, it's
    possible to have it incremental at any nested level of the doc tree,
    but I'm not sure the extra overhead is worth doing it beyond the
    root fields.

    However, Michael's concern of the document getting larger and the
    app getting slower still applies, the document must still be loaded
    into memory on the server and the diffs applied, and the complete
    doc will need to be loaded into memory for view indexing too.
    Michael, regardless of the diff updates, I'm thinking you need to
    break you document up into multiple documents.

    -Damien
    On Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:

    I think that this should be pretty easily done using:
    a) well defined pretty format output
    b) standard diff

    The reason for (a) is that you need this to get line breaks, which
    are
    critical to diffing correctly.

    On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater <nslater@apache.org>
    wrote:
    On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
    Will this cause bandwidth issues when updating large documents if
    only a
    single field changes. I am afraid that as my documents grow
    larger my app gets
    slower.
    I for one am interested to hear JSON diff proposals. I think this
    would
    make a
    great addition to CouchDB. As best I can tell, this should really
    be done
    as an
    external standardisation effort so the whole community could
    benifit. I
    don't
    think using JavaScript to set the document attributes is a very good
    solution to
    this. An entirely new Media Type is needed, IMHO.

    --
    Noah Slater, http://tumbolia.org/nslater
  • Antony Blakey at Nov 13, 2008 at 9:39 pm

    On 14/11/2008, at 3:30 AM, Damien Katz wrote:

    I was planning on something similar this for field and attachment
    level replication, where only the fields or attachments that are
    changed are replicated.
    Are you planning on sending attachment deltas e.g. rsync/unison? That
    would be enormously useful for my CouchDB app.

    Antony Blakey
    -------------
    CTO, Linkuistics Pty Ltd
    Ph: 0438 840 787

    A Man may make a Remark –
    In itself – a quiet thing
    That may furnish the Fuse unto a Spark
    In dormant nature – lain –

    Let us divide – with skill –
    Let us discourse – with care –
    Powder exists in Charcoal –
    Before it exists in Fire –

    -– Emily Dickinson 913 (1865)

Related Discussions