Fully loaded multi-tenant Django application with 1000's of WebSockets using Daphne/Channels, running fine for a few months and suddenly tenants all calling it the support line the application running slow or outright hanging. Narrowed it down to WebSockets as HTTP REST API hits came through fast and error free.
None of the application logs or OS logs indicate some issue, so only thing to go on is the exception noted below. It happened over and over again here and there throughout 2 days.
I don't expect any deep debugging help, just some off-the-cuff advice on possibilities.
AWS Linux 1
Python 3.6.4
Elasticache Redis 5.0
channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0
Django==2.2.13
Split configuration HTTP served by uwsgi, daphne serves asgi, Nginx
May 10 08:08:16 prod-b-web1: [pid 15053] [version 119.5.10.5086] [tenant_id -] [domain_name -] [pathname /opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/daphne/server.py] [lineno 288] [priority ERROR] [funcname application_checker] [request_path -] [request_method -] [request_data -] [request_user -] [request_stack -] Exception inside application: Lock is not acquired.
Traceback (most recent call last):
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 435, in receive
real_channel
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 484, in receive_single
await self.receive_clean_locks.acquire(channel_key)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 152, in acquire
return await self.locks[channel].acquire()
File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 176, in acquire
yield from fut
concurrent.futures._base.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
return await self.inner(receive, self.send)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/middleware.py", line 41, in coroutine_call
await inner_instance(receive, send)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
[receive, self.channel_receive], self.dispatch
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/utils.py", line 58, in await_many_dispatch
await task
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 447, in receive
self.receive_lock.release()
File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 201, in release
raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
First, lets have a look at the source of the
RuntimeError: Lock is not acquired.
error. As given by the traceback, therelease()
method in the file/opt/python3.6/lib/python3.6/asyncio/locks.py
is defined like so:A primitive lock is a synchronization primitive that is not owned by a particular thread when locked.
When attempting to release an unlocked lock by calling the
release()
method, theRuntimeError
will be raised, as the method should only be called in the locked state. The state changes to unlocked when called in the locked state.Now for the previous error raised in the
acquire()
method in the same file, theacquire()
method is defined like so:So in order for the
concurrent.futures._base.CancelledError
error you're getting to be raised, theawait fut
must've caused the issue.To fix it, you can have a look at Awaiting an asyncio.Future raises concurrent.futures._base.CancelledError instead of waiting for a value/exception to be set
Basically, you might have an awaitable in your code that you didn't await, and by not awaiting it, you never handed control back to the event loop or store the awaitable, causing it to be immediately cleaned up, completely cancelling it (and all of the awaitables it controlled).
Simply make sure you await the results of the awaitables in your code, finding any you missed.