I am using Celery for tasks in Python and I have already tried many things to optimize my code. Before explaining what problem I am facing let me give you my code samples: I have a db.py file which contains classes and MongoDB connection as below (MongoDB is provided by a third party and thus in remote location):
connect(host=MONGO_URI, connect=False, serverSelectionTimeoutMS=150000, maxPoolSize=300)
class Roles(Document):
name = StringField()
roles = DictField()
class Customer(Document):
customerid = StringField()
customername = StringField()
customer_apikey = StringField()
service_ids = ListField(StringField())
Then I have my task.py as follows:
from biyosecure_dto_v2 import *
app = celery.Celery('tasker')
app.conf.update(broker_url=redis_uri, result_backend=redis_uri)
@app.task
def sample_task(x,y):
get_roles()
get_customer()
result = make_request_to_third_party_api()
return result
I use docker containers and I start my Celery with below command making it concurrent:
["celery" ,"-A","task","worker","--loglevel=info", "-c", "1000", "-P", "eventlet"]
Problem: When code gets under load I need to be able to make 200 or more requests per second to third party provider. Celery is able to pick up tasks fast from Redis and tries to run tasks concurrently. The problem rises because of initialization of MongoDB connection in the get_roles() function and it takes more time compared to get_customer() and I have to say that both these functions are optimized and fast and only the first MongoDB related operation consumes time.I tried to fix this by increasing maxPoolSize to 300 in PyMongo and it improved the speed. I wonder if there is a better way to do it. I mean by using one connection for all tasks maybe? How should I do it? Is it even possible?