Hi,
We've been running Mongoid 2.4 - 2.5 successfully on a numbers of apps for
some time now. But we recently shifted all our EC2 setup from the US to the
new AWS Sydney. Since then we have had a lot of connection issues and app
downtime. MongoHQ have set us up dedicated replica set in Sydney that all
our apps use and all of them are having problems. They cannot see any
problems with the db.
Our web processes (whether they are Thin, Passenger or Unicorn) all lock up
at various times and bring the app down and the cause is the Mongo
connection. It seems to be worse when the app has been idle and there has
been no database requests for awhile. But it's not consistent. Here is an
excerpt from the log when the Thin process gets locked (this app is running
Mongoid 3). All 4 Thin processes locked up over a period of a few minutes
and all had a Moped COMMAND that run for 15mins.
[24/Jan/2013 10:14:36] "GET /nmis/0000000000?checksum=7 HTTP/1.0" 200 795
0.0055
MOPED: 11.250.45.113:27017 COMMAND database=admin
command={:ismaster=>1} (0.8063ms)
MOPED: 11.250.45.76:27017 COMMAND database=admin
command={:ismaster=>1} (931194.7358ms)
Errno::ETIMEDOUT - Connection timed out:
/var/www/msats/vendor/bundle/ruby/1.9.1/gems/moped-1.3.2/lib/moped/sockets/connectable.rb:45:in
`read'
/var/www/msats/vendor/bundle/ruby/1.9.1/gems/moped-1.3.2/lib/moped/sockets/connectable.rb:45:in
`block in read'
.........
........
[24/Jan/2013 10:31:53] "GET /nmis/0000000000?checksum=7 HTTP/1.0" 200 795
0.0032
MOPED: 11.250.45.113:27017 COMMAND database=admin
command={:ismaster=>1} (0.8202ms)
MOPED: 11.250.45.76:27017 COMMAND database=admin
command={:ismaster=>1} (2.9578ms)
MOPED: 11.250.45.76:27017 QUERY database=production
collection=msats_config selector={"$query"=>{}, "$orderby"=>{:_id=>1}}
flags=[:slave_ok] limit=-1 skip=0 batch_size=nil fields=nil (1.4293ms)
[24/Jan/2013 10:31:54] "GET / HTTP/1.0" 200 831 0.3535
Any ideas what is going on? This may be a MongoHQ / AWS problem but how can
I at least reduce the timeout so the app recovers faster? Why is it getting
stuck? I've tried to add op_timeout and connect_timeout to the mongoid.yml
but doesn't seem to help.
Any help with this would be much appreciated.
Thanks.