On Oct 1, 2014, at 4:16 PM, Eric B wrote:
Attachments are identified by name and can be replaced without mutating
old references to documents with attachments of
the same name.
This is where you lose me a little. How can I have multiple references to
the same attachment? Am I not able to have 2 documents with 2 distinct
attachments with the same name? For example, if each user uploads a
"photo.jpg" that is attached to their profile?
Or are you referring to the ability to retrieve an older rev of the
document and retrieve the older rev of the attachment? For example, in
rev1 of a doc I attach photo.jpg and in rev2 I update the photo.jpg. Do
you mean I can still retrieve rev1 and the original photo.jpg?
Yes. I just wanted to make it clear that you have a version of each attachment
at a given revpos. So this means that you can replace or delete the attachment
but old revisions will reference them until they are culled by compaction. If you
have conflicting revisions they will both be able to keep different attachments
under the same name.
DocA-1 + cat.gif (md5-hash 1234, revpos 1)
DocA-2 + cat.gif (md5-hash 1234, revpos 1)
[at this point we only have one copy]
DocA-3 + cat.git (md5-hash 7890, revpos 3)
[now I have two different gifs but I can get both until I compact]
Compact [DocA-3 is the only leaf revision with no conflicts]
[now I should only have one gif (7890)]
If we had conflicts, then we’d possibly have more images. You can play around
with this more by using the atts_since=N query parameter. Keep in mind that
content is not currently deduplicated between different documents so that is
where the application can do some work to ensure that only one of anything is
stored by using a digest like SHA1 or similar as the document id. This
artificially restricts one attachment per doc but I find things work a bit better
when you avoid having huge numbers of attachments to manage per