PyMongo dropping connections to mongos randomly

615 Views Asked by At

I have a MongoDB cluster (2.6.3) with three mongos processes and two replica sets, with no sharding enabled.

In particular, I have 7 hosts (all are Ubuntu Server 14.04):

  • host1: mongos + Client aplication
  • host2: mongos + Client aplication
  • host3: mongos + Client aplication
  • host4: RS1_Primary (or RS1_Secondary) and RS2_Arbitrer
  • host5: RS1_Secondary (or RS1_Primary)
  • host6: RS2_Primary (or RS2_Secondary) and RS1_Arbitrer
  • host7: RS2_Secondary (or RS2_Primary)

The Client application here is a Zato Cluster with 4 gunicorn workers running in each server which accesses MongoDB using two PyMongo.MongoClient instances for each worker. These MongoClient objects are created as follows:

MongoClient(mongo_hosts, read_preference=ReadPreference.SECONDARY_PREFERRED, w=0, max_pool_size=25)
MongoClient(mongo_hosts, read_preference=ReadPreference.SECONDARY_PREFERRED, w=0, max_pool_size=10)

where this mongo_hosts is: 'host1:27017,host2:27017,host2:27017' in all servers.

So, in total, I have 12 MongoClient instances with max_pool_size=25 (4 in each server) and 12 others with max_pool_size=10 (also 4 in each server)

And my problem is:

When the Zato clusters are started and begin receiving requests (up to 10 rq/sec each, balanced using a simple round robin), a bunch of new connections are created and around 15-20 are then kept permanently open over the time in each mongos.

However, at some random point and with no apparent cause, a couple of connections are suddenly dropped at the same time in all three mongos and then the total number of connections keeps changing randomly until it stabilizes again after some minutes (from 5 to 10). And while this happens, even though I see no slow queries in MongoDB logs (neither in mongos nor in mongod) the performance of the platform is severely reduced.

I have been isolating the problem and already tried to:

  • change the connection string to 'localhost:27017' in each MongoClient to see if the problem was in only one of the clients. The problem persisted, and it keeps affecting the three mongos at the same time, so it looks like something in the server side.
  • add log traces to make sure that the performance is lost inside MongoClient. The result is that running a simple find query in MongoClient is clearly seen to last more than one second in the client side, while usually it's less than 10ms. However, as I said before, I see no slow queries at all in MongoDB logs (default profiling level: 100ms).
  • monitor the platform activity to see if there's a load increase when this happens. There's none, and indeed it can even happen during low load periods.
  • monitor other variables in the servers, such as cpu usage or disk activity. I found nothing suspicious at all.

So, the questions at the end are:

  • Has anyone seen something similar (connections being dropped in PyMongo)?
  • What else can I look at to debug the problem?
  • Possible solution: MongoClient allows the definition of a max_pool_size, but I haven't found any reference to a min_pool_size. Is it possible to define so? Perhaps making the number of connections static would fix my performance problems.

Note about MongoDB version: I am currently running MongoDB 2.6.3 but I already had this problem before upgrading from 2.6.1, so it's nothing introduced in the last version.

0

There are 0 best solutions below