Grokbase Groups CouchDB user May 2011
FAQ

[CouchDB-user] Limit on the number of databases?

Glenn Bech
May 26, 2011 at 10:23 am
Hi,

I just want to ask if there are limits on the number of databases in Couch.
I am playing around with embeded Couch on Android and are thinking in the
line of having
one database per user, and use replication to push data from the client to
the server. This will provde for an Excellent "offline" user experience.

This will of course not work if Couch does not handle unlimited datbases
very well performance- or otherwise.

Does this sound like a feasable design solution?

Regards,

Glenn
reply

Search Discussions

7 responses

  • Marcos Ortiz at May 26, 2011 at 2:30 pm

    On 05/26/2011 05:52 AM, Glenn Bech wrote:
    Hi,

    I just want to ask if there are limits on the number of databases in Couch.
    I am playing around with embeded Couch on Android and are thinking in the
    line of having
    one database per user, and use replication to push data from the client to
    the server. This will provde for an Excellent "offline" user experience.

    This will of course not work if Couch does not handle unlimited datbases
    very well performance- or otherwise.

    Does this sound like a feasable design solution?

    Regards,

    Glenn
    I think that you can think about your design.
    What happens if your service tend to grow to millions of users?

    You maybe think to have organized these users regionally for example,
    and on that way you can
    have less documents entries in your databases.

    For example:
    - DB 1: All U.S users
    - DB 2: All European Users
    - DB 2: All Indian Users
    - DB 4: All Chinese Users

    Remember, It's a suggestion.

    Regards

    --
    Marcos Luis Ortiz Valmaseda
    Software Engineer (Distributed Systems)
    http://uncubanitolinuxero.blogspot.com
  • Sam Bisbee at May 26, 2011 at 3:16 pm

    On Thu, May 26, 2011 at 6:22 AM, Glenn Bech wrote:
    Hi,

    I just want to ask if there are limits on the number of databases in Couch.
    I am playing around with embeded Couch on Android and are thinking in the
    line of having
    one database per user, and use replication to push data from the client to
    the server. This will provde for an Excellent "offline" user experience.

    This will of course not work if Couch does not handle unlimited datbases
    very well performance- or otherwise.

    Does this sound like a feasable design solution?

    Regards,

    Glenn
    I have been looking into this recently with an application design that
    would use thousands of databases - tens to hundreds of thousands after
    a year of usage. From what I have found there are a few considerations
    to weigh...

    - Since each database is a file, you are going to need one file
    descriptor per active database. CouchDB and erlang have their own
    internal maximums that you can play with. Once their max is reached,
    CouchDB starts to close the oldest file descriptors.

    - Since CouchDB closes file descriptors, if you have a bunch of
    active databases you run the risk of slowing down your machine.
    Increasing the number of open file descriptors is not always the best
    solution, because you could start to see OS level performance issues.

    - Don't forget your OS's max open files. Take a look at ulimit or
    pam if you're on a *nix machine.

    - On a non-performance note, you can't do map/reduce across
    databases. If you plan on referencing between them or combining data,
    then you're probably going to have a index database that some client
    code puts its results into.

    Cheers,

    --
    Sam Bisbee
    www.sbisbee.com
  • Brian Mitchell at May 26, 2011 at 5:07 pm

    On Thursday, May 26, 2011 at 11:15 AM, Sam Bisbee wrote:

    - On a non-performance note, you can't do map/reduce across
    databases. If you plan on referencing between them or combining data,
    then you're probably going to have a index database that some client
    code puts its results into.
    That or it's reasonable to build a data warehouse which does this in an aggregate database (via replication, possibly filtered). One benefit of having smaller databases is that view generation is cheaper if you want to avoid downtime and don't want to deal with stale views (not an option in many cases).

    I've been investigating this too and will report on success if I achieve it, though at worst I'll just fall back to a larger database with BigCouch and a very large Q-value (shard count).

    Brian.
  • Jayesh Thakrar at May 26, 2011 at 5:32 pm
    I am very new to couchdb - but wondering if the approach below could work.

    1. Have one or more independent clusters of couchdb (start with 1 and add more
    as needed).
    2. Layout a DB naming scheme
    3. Have an appropriate firewall/router/switch in front of the client
    machines/network and have that router redirect the connection/traffic to the
    appropriate server based on the URL of the REST request.

    -- Jayesh




    ________________________________
    From: Brian Mitchell <binary42@gmail.com>
    To: user@couchdb.apache.org
    Cc: glenn.bech@gmail.com
    Sent: Thu, May 26, 2011 12:07:07 PM
    Subject: Re: Limit on the number of databases?


    On Thursday, May 26, 2011 at 11:15 AM, Sam Bisbee wrote:

    - On a non-performance note, you can't do map/reduce across
    databases. If you plan on referencing between them or combining data,
    then you're probably going to have a index database that some client
    code puts its results into.
    That or it's reasonable to build a data warehouse which does this in an
    aggregate database (via replication, possibly filtered). One benefit of having
    smaller databases is that view generation is cheaper if you want to avoid
    downtime and don't want to deal with stale views (not an option in many cases).

    I've been investigating this too and will report on success if I achieve it,
    though at worst I'll just fall back to a larger database with BigCouch and a
    very large Q-value (shard count).

    Brian.
  • Ajai Khattri at May 26, 2011 at 7:40 pm
    Since we're discussing large numbers of mobile clients: would it be
    possible to do replication in "batches" from a client (i.e. not
    immediately) so that server resources are not continuously tied up?

    On another project Ive worked on where syncing from mobile clients was
    involved, we developed a scheme where the server informs the clients at
    the end of the sync process, which time they should next sync. It allowed
    us to stagger syncing of large numbers of clients across a 24hr period.


    --
    Aj.
  • Ajai Khattri at May 26, 2011 at 3:19 pm

    On Thu, 26 May 2011, Glenn Bech wrote:

    I just want to ask if there are limits on the number of databases in Couch.
    I am playing around with embeded Couch on Android and are thinking in the
    line of having
    one database per user, and use replication to push data from the client to
    the server. This will provde for an Excellent "offline" user experience.

    This will of course not work if Couch does not handle unlimited datbases
    very well performance- or otherwise.

    Does this sound like a feasable design solution?
    As someone else already pointed out, there is the potential for lots of
    users, so it needs some more thought.

    I am also working on something similar (Im an Android developer) and also
    thought about one database per user. But Im thinking it might be better to
    think about sharding from day one, so maybe having a web service that your
    app calls before setting up replication might be a better way to go. The
    web service could assign a specific server to each user so you could
    easily switch to a new server when you started hitting limits.



    --
    Aj.
  • Brian Mitchell at May 26, 2011 at 3:22 pm

    On Thursday, May 26, 2011 at 6:22 AM, Glenn Bech wrote:

    Hi,

    I just want to ask if there are limits on the number of databases in Couch.
    I am playing around with embeded Couch on Android and are thinking in the
    line of having
    one database per user, and use replication to push data from the client to
    the server. This will provde for an Excellent "offline" user experience.

    This will of course not work if Couch does not handle unlimited datbases
    very well performance- or otherwise.

    Does this sound like a feasable design solution?

    Regards,

    Glenn
    I've done some testing and there are a couple things to keep in mind.

    First of all, CouchDB relies directly on the scalability of your filesystem. Having one database in CouchDB means you also have at least one file for each of those. Since CouchDB currently stores them all in one directory, you'll need to make sure you select a filesystem that can handle your expected scale appropriately (many filesystems should be fine in the millions of files level, but characteristics can differ so do test this).

    Another problem, one which I don't have an immediate answer for is backup. While you could claim replication is enough for this, I'd say it isn't. The event you need backups for also cover events like maliciously destroyed or manipulated data or simply the existence of bugs. I'd rather not trust my data never get screwed up. by the code that accesses it. Many backup systems are designed around a small number of files. Being able to rollback to a point in time with millions of files could be an extremely painful process. (I have ideas on how to solve this but it's still not an easy problem.)

    Last but not least, consider the number of active databases you'll need at any single time. This can be split across many machines of course but it still adds up quickly. Open file descriptors are great but not if you have to close and then reopen them all the time. A carefully tuned VM can manage many thousands w/o a problem but I wouldn't push this too much higher. So if you have 15 machines and 30k active users for any single 1 minute window, that would be 2k files open and active per machine.

    Brian.

Related Discussions