Setup: CentOS 7, Celery 4.1, RabbitMQ (broker), Redis (backend).
We have ~15 workers in 3 machines, each with concurrency 1-5.
We recently upgraded from 3.1 and Worker become offline randomly with no exception/error. In workers log it seems like it's finished tasks successfully.
Machines are monitored and this is not overload issue (cpu/memory are ok). Any idea what could it be? what cause the workers randomly stop working? any suggestion how to debug?
BTW: we see the offline workers via flower.
Thanks