FAQ
I'm seeing replication behavior that I don't understand. I wonder if
it's stalled.

I've got two couchdb servers, each with four databases. A cron job runs
once a minute and tells each server to do continuous replication from
each database on the other server. What I'm seeing for one of the
databases has me confused: I see, in the couch.log for the "source"
database, 'GET' entries consistent with the source database fetching
documents and their attachments. But on the destination database, the
fetched documents and attachments do not appear.

To answer the question, "Are the GET entries coming from some other
instance of couchdb?", I stopped couchdb on the destination server. The
'GET' entries in the log of the source server stopped. I then restarted
the destination couchdb server and the log entries resumed.

Futon on the source database shows:
Overview:
Name: ps
Size: 181.5 GB
Number of Documents: 43,090
Update Seq: 43,741
Status:
Type: Replication
Object: 49a5c5: http://carbon:5984/ps/ -> ps
PID: <0.14439.1841>
Status: MR Processed source update #6114

Futon on the destination database shows:
Overview:
Name: ps
Size: 58.3 GB
Number of Documents: 6,107
Update Seq: 6,114
Status:
Type: Replication
Object: 0c52c5: http://sodium:5984/ps/ -> ps
PID: <0.234.0>
Status: Starting

The status on the destination database has been "Starting" since I
restarted couchdb on the destination server about 15 minutes ago.

Both the source and destination databases are being written to by user
processes on an intermittent basis: anywhere from 0 to a few dozen
documents per minute, each document with up to a few dozen megabytes of
attachments.

I see no error entries in either the source or destination server's
couch.log.

Versions:
Couchdb: 1.0.2
OS: Linux 2.6.32 (AMD 64)

Why don't I see any documents being added to the destination database?

Search Discussions

  • Wayne Conrad at Mar 1, 2011 at 2:41 am

    On 02/21/2011 11:18 AM, Wayne Conrad wrote:
    I'm seeing replication behavior that I don't understand. I wonder if
    it's stalled.
    It's not stalled. It's going very, very slowly. I think I understand why.

    Some of my documents have tens of thousands of attachments. When I
    first started storing the fat documents in couchdb, it took half an hour
    or more to add them. To make it faster, and to prevent timeouts, I
    store the attachments inline, but in chunks of 100 attachments at a
    time. Doing that, even my largest documents take only a minute or so to
    store.

    I can store a document with 32,768 attachments of 4k each in 55 seconds
    (2.4k/sec). But to replicate that document (using "pull" replication)
    takes 19.5 minutes. That's 115k per second. Storing, then, is 20 times
    faster than replicating. When I look at the log on the source database,
    I see that the destination database is retrieving one attachment at a
    time, and (I presume) experiencing the same speed problem that caused me
    to write my "store bunches of attachments at a time" optimization. Now
    it seems that, in order for replication to have any chance of keeping up
    with the rate at which I can store data, I'm going to need the same sort
    of optimization during replication.

    I'm a couch toddler, and when it comes to Erlang, I'm not even on solid
    food yet. What are the odds of me writing my own replication engine in,
    say, Ruby, one that can do the special optimizations I need? How
    difficult a project is it?
  • Filipe David Manana at Mar 1, 2011 at 10:16 am

    On Tue, Mar 1, 2011 at 2:41 AM, Wayne Conrad wrote:
    I'm a couch toddler, and when it comes to Erlang, I'm not even on solid food
    yet.  What are the odds of me writing my own replication engine in, say,
    Ruby, one that can do the special optimizations I need?  How difficult a
    project is it?
    You can also try the new replicator that landed recently into trunk
    (from where 1.2 will be cut from), it adds more parallelism so you'll
    likely see a better performance:

    http://s.apache.org/KsY

    https://github.com/apache/couchdb/commit/34eb4175f8547546cd76fbeb006421020cbf0d71


    --
    Filipe David Manana,
    fdmanana@gmail.com, fdmanana@apache.org

    "Reasonable men adapt themselves to the world.
    Unreasonable men adapt the world to themselves.
    That's why all progress depends on unreasonable men."
  • Wayne Conrad at Mar 1, 2011 at 6:54 pm

    On 03/01/11 03:15, Filipe David Manana wrote:
    You can also try the new replicator that landed recently into trunk
    (from where 1.2 will be cut from), it adds more parallelism so you'll
    likely see a better performance:

    http://s.apache.org/KsY

    https://github.com/apache/couchdb/commit/34eb4175f8547546cd76fbeb006421020cbf0d71
    Version 1.2.0ac052866-git replicates my ginormous documents with
    panache. My test-case-de-jeur is a document with 32,700 attachments, 4k
    each. Inserting 100 attachments at a time, the document gets inserted
    into the source database in 42 seconds. The destination database is
    pulling the changes so fast I can't measure the latency by refreshing
    futon: The moment I'm done adding the last attachment to the source
    database, it appears in the destination database. Outstanding!

    I had only one small issue building it: on Debian, the package
    "erlang-eunit" is required to run "make check." Is that worth
    mentioning in the DEVELOPER file?

    Before I put it in production, "make check" reports some failures. Do
    these matter?

    Test Summary Report
    -------------------
    /var/lib/couchdb/custom-install-tree/couchdb/test/etap/160-vhosts.t
    (Wstat: 0 Tests: 10 Failed: 0)
    Parse errors: Bad plan. You planned 14 tests but ran 10.
    /var/lib/couchdb/custom-install-tree/couchdb/test/etap/173-os-daemon-cfg-register.t
    (Wstat: 0 Tests: 8 Failed: 0)
    Parse errors: Bad plan. You planned 27 tests but ran 8.
    Files=36, Tests=724, 52 wallclock secs ( 0.15 usr 0.04 sys + 11.29 cusr
    1.16 csys = 12.64 CPU)
    Result: FAIL

    Oh, and: Can I try this out by just using this version on the
    destination database, or does the source database also need need version?
  • Filipe David Manana at Mar 1, 2011 at 7:27 pm

    On Tue, Mar 1, 2011 at 6:54 PM, Wayne Conrad wrote:
    On 03/01/11 03:15, Filipe David Manana wrote:

    You can also try the new replicator that landed recently into trunk
    (from where 1.2 will be cut from), it adds more parallelism so you'll
    likely see a better performance:

    http://s.apache.org/KsY


    https://github.com/apache/couchdb/commit/34eb4175f8547546cd76fbeb006421020cbf0d71
    Version 1.2.0ac052866-git replicates my ginormous documents with panache.
    My test-case-de-jeur is a document with 32,700 attachments, 4k each.
    Inserting 100 attachments at a time, the document gets inserted into the
    source database in 42 seconds.  The destination database is pulling the
    changes so fast I can't measure the latency by refreshing futon: The moment
    I'm done adding the last attachment to the source database, it appears in
    the destination database.  Outstanding!

    I had only one small issue building it: on Debian, the package
    "erlang-eunit" is required to run "make check."  Is that worth mentioning in
    the DEVELOPER file?

    Before I put it in production, "make check" reports some failures.  Do these
    matter?
    No they're related to the vhost and OS daemon features. The former was
    fixed this morning I believe, while the later happens often but not
    always. They're completely unrelated to the replicator, rest assured
    :)
    Test Summary Report
    -------------------
    /var/lib/couchdb/custom-install-tree/couchdb/test/etap/160-vhosts.t
    (Wstat: 0 Tests: 10 Failed: 0)
    Parse errors: Bad plan.  You planned 14 tests but ran 10.
    /var/lib/couchdb/custom-install-tree/couchdb/test/etap/173-os-daemon-cfg-register.t
    (Wstat: 0 Tests: 8 Failed: 0)
    Parse errors: Bad plan.  You planned 27 tests but ran 8.
    Files=36, Tests=724, 52 wallclock secs ( 0.15 usr  0.04 sys + 11.29 cusr
    1.16 csys = 12.64 CPU)
    Result: FAIL

    Oh, and: Can I try this out by just using this version on the destination
    database, or does the source database also need need version?
    This new replicator should work flawlessly with CouchDB 1.0.2+. Pull
    replicating (when there are compressed attachments in the source) from
    1.0.1 (or bellow) server might not work under some circumstances -
    this was fixed in 1.0.2 (a fix to the document multipart/mixed and
    multipart/related streaming APIs).

    Glad to know it's working much better for you compared to the old replicator :)

    regards,

    --
    Filipe David Manana,
    fdmanana@gmail.com, fdmanana@apache.org

    "Reasonable men adapt themselves to the world.
    Unreasonable men adapt the world to themselves.
    That's why all progress depends on unreasonable men."
  • Adam Kocoloski at Mar 1, 2011 at 7:35 pm

    On Mar 1, 2011, at 2:26 PM, Filipe David Manana wrote:
    On Tue, Mar 1, 2011 at 6:54 PM, Wayne Conrad wrote:
    On 03/01/11 03:15, Filipe David Manana wrote:

    You can also try the new replicator that landed recently into trunk
    (from where 1.2 will be cut from), it adds more parallelism so you'll
    likely see a better performance:

    http://s.apache.org/KsY


    https://github.com/apache/couchdb/commit/34eb4175f8547546cd76fbeb006421020cbf0d71
    Version 1.2.0ac052866-git replicates my ginormous documents with panache.
    My test-case-de-jeur is a document with 32,700 attachments, 4k each.
    Inserting 100 attachments at a time, the document gets inserted into the
    source database in 42 seconds. The destination database is pulling the
    changes so fast I can't measure the latency by refreshing futon: The moment
    I'm done adding the last attachment to the source database, it appears in
    the destination database. Outstanding!

    I had only one small issue building it: on Debian, the package
    "erlang-eunit" is required to run "make check." Is that worth mentioning in
    the DEVELOPER file?
    I'm a bit surprised by this one, I thought we had figured out how to avoid that. Bob?
    Before I put it in production, "make check" reports some failures. Do these
    matter?
    No they're related to the vhost and OS daemon features. The former was
    fixed this morning I believe, while the later happens often but not
    always. They're completely unrelated to the replicator, rest assured
    :)
    Yes, the errors in 173 have happened in each of the last four CI builds (50-53) for R14B01:

    http://jenkins.cloudant.com/job/CouchDB-trunk-R14B01/

    I've discussed them with Paul and one of these days we'll get around to making that test less timing-sensitive.

    Thanks for the detailed report Wayne. Cheers, Adam
  • Robert Newson at Mar 1, 2011 at 7:44 pm
    I mentioned the undeclared dependency on eunit and the fact that
    Debian packaged it separately some months ago. In addition, I passed a
    patch to mochiweb (where the dependency comes from) to fix it, but
    failed to chase it up.

    B.
    On 1 March 2011 19:34, Adam Kocoloski wrote:
    On Mar 1, 2011, at 2:26 PM, Filipe David Manana wrote:
    On Tue, Mar 1, 2011 at 6:54 PM, Wayne Conrad wrote:
    On 03/01/11 03:15, Filipe David Manana wrote:

    You can also try the new replicator that landed recently into trunk
    (from where 1.2 will be cut from), it adds more parallelism so you'll
    likely see a better performance:

    http://s.apache.org/KsY


    https://github.com/apache/couchdb/commit/34eb4175f8547546cd76fbeb006421020cbf0d71
    Version 1.2.0ac052866-git replicates my ginormous documents with panache.
    My test-case-de-jeur is a document with 32,700 attachments, 4k each.
    Inserting 100 attachments at a time, the document gets inserted into the
    source database in 42 seconds.  The destination database is pulling the
    changes so fast I can't measure the latency by refreshing futon: The moment
    I'm done adding the last attachment to the source database, it appears in
    the destination database.  Outstanding!

    I had only one small issue building it: on Debian, the package
    "erlang-eunit" is required to run "make check."  Is that worth mentioning in
    the DEVELOPER file?
    I'm a bit surprised by this one, I thought we had figured out how to avoid that.  Bob?
    Before I put it in production, "make check" reports some failures.  Do these
    matter?
    No they're related to the vhost and OS daemon features. The former was
    fixed this morning I believe, while the later happens often but not
    always. They're completely unrelated to the replicator, rest assured
    :)
    Yes, the errors in 173 have happened in each of the last four CI builds (50-53) for R14B01:

    http://jenkins.cloudant.com/job/CouchDB-trunk-R14B01/

    I've discussed them with Paul and one of these days we'll get around to making that test less timing-sensitive.

    Thanks for the detailed report Wayne.  Cheers, Adam
  • Paul Davis at Mar 1, 2011 at 7:55 pm

    On Tue, Mar 1, 2011 at 2:34 PM, Adam Kocoloski wrote:
    On Mar 1, 2011, at 2:26 PM, Filipe David Manana wrote:
    On Tue, Mar 1, 2011 at 6:54 PM, Wayne Conrad wrote:
    On 03/01/11 03:15, Filipe David Manana wrote:

    You can also try the new replicator that landed recently into trunk
    (from where 1.2 will be cut from), it adds more parallelism so you'll
    likely see a better performance:

    http://s.apache.org/KsY


    https://github.com/apache/couchdb/commit/34eb4175f8547546cd76fbeb006421020cbf0d71
    Version 1.2.0ac052866-git replicates my ginormous documents with panache.
    My test-case-de-jeur is a document with 32,700 attachments, 4k each.
    Inserting 100 attachments at a time, the document gets inserted into the
    source database in 42 seconds.  The destination database is pulling the
    changes so fast I can't measure the latency by refreshing futon: The moment
    I'm done adding the last attachment to the source database, it appears in
    the destination database.  Outstanding!

    I had only one small issue building it: on Debian, the package
    "erlang-eunit" is required to run "make check."  Is that worth mentioning in
    the DEVELOPER file?
    I'm a bit surprised by this one, I thought we had figured out how to avoid that.  Bob?
    Before I put it in production, "make check" reports some failures.  Do these
    matter?
    No they're related to the vhost and OS daemon features. The former was
    fixed this morning I believe, while the later happens often but not
    always. They're completely unrelated to the replicator, rest assured
    :)
    Yes, the errors in 173 have happened in each of the last four CI builds (50-53) for R14B01:

    http://jenkins.cloudant.com/job/CouchDB-trunk-R14B01/

    I've discussed them with Paul and one of these days we'll get around to making that test less timing-sensitive.

    Thanks for the detailed report Wayne.  Cheers, Adam
    Yeah, I tried rewriting this one in shell but got stuck on how to get
    the script to close when stdin was closed. If anyone knows a bit of
    shell foo that can do that, feel free to let me in on the secret.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriescouchdb
postedFeb 21, '11 at 6:19p
activeMar 1, '11 at 7:55p
posts8
users5
websitecouchdb.apache.org
irc#couchdb

People

Translate

site design / logo © 2022 Grokbase