I have a MongoDB cluster (2.6.3) with three mongos processes and two replica sets, with no sharding enabled.
In particular, I have 7 hosts (all are Ubuntu Server 14.04):
- host1: mongos + Client aplication
- host2: mongos + Client aplication
- host3: mongos + Client aplication
- host4: RS1_Primary (or RS1_Secondary) and RS2_Arbitrer
- host5: RS1_Secondary (or RS1_Primary)
- host6: RS2_Primary (or RS2_Secondary) and RS1_Arbitrer
- host7: RS2_Secondary (or RS2_Primary)
The Client application here is a Zato Cluster with 4 gunicorn workers running in each server which accesses MongoDB using two PyMongo.MongoClient
instances for each worker.
These MongoClient
objects are created as follows:
MongoClient(mongo_hosts, read_preference=ReadPreference.SECONDARY_PREFERRED, w=0, max_pool_size=25)
MongoClient(mongo_hosts, read_preference=ReadPreference.SECONDARY_PREFERRED, w=0, max_pool_size=10)
where this mongo_hosts is: 'host1:27017,host2:27017,host2:27017'
in all servers.
So, in total, I have 12 MongoClient
instances with max_pool_size=25
(4 in each server) and 12 others with max_pool_size=10
(also 4 in each server)
And my problem is:
When the Zato clusters are started and begin receiving requests (up to 10 rq/sec each, balanced using a simple round robin), a bunch of new connections are created and around 15-20 are then kept permanently open over the time in each mongos.
However, at some random point and with no apparent cause, a couple of connections are suddenly dropped at the same time in all three mongos
and then the total number of connections keeps changing randomly until it stabilizes again after some minutes (from 5 to 10).
And while this happens, even though I see no slow queries in MongoDB logs (neither in mongos nor in mongod) the performance of the platform is severely reduced.
I have been isolating the problem and already tried to:
- change the connection string to
'localhost:27017
' in each MongoClient to see if the problem was in only one of the clients. The problem persisted, and it keeps affecting the threemongos
at the same time, so it looks like something in the server side. - add log traces to make sure that the performance is lost inside MongoClient. The result is that running a simple find query in MongoClient is clearly seen to last more than one second in the client side, while usually it's less than 10ms. However, as I said before, I see no slow queries at all in MongoDB logs (default profiling level: 100ms).
- monitor the platform activity to see if there's a load increase when this happens. There's none, and indeed it can even happen during low load periods.
- monitor other variables in the servers, such as cpu usage or disk activity. I found nothing suspicious at all.
So, the questions at the end are:
- Has anyone seen something similar (connections being dropped in PyMongo)?
- What else can I look at to debug the problem?
- Possible solution:
MongoClient
allows the definition of amax_pool_size
, but I haven't found any reference to amin_pool_size
. Is it possible to define so? Perhaps making the number of connections static would fix my performance problems.
Note about MongoDB version: I am currently running MongoDB 2.6.3 but I already had this problem before upgrading from 2.6.1, so it's nothing introduced in the last version.