My dotcloud setup (django-celery with rabbitmq) was working fine a week ago - the processes were starting up ok and the logs were clean. However, I recently repushed (without updating any of the code), and now the logs are saying that the processes fail to start even though they seem to be running.
Supervisord log
dotcloud@hack-default-www-0:/var/log/supervisor$ more supervisord.log 2012-06-03 10:51:51,836 CRIT Set uid to user 1000 2012-06-03 10:51:51,836 WARN Included extra file "/etc/supervisor/conf.d/uwsgi.c onf" during parsing 2012-06-03 10:51:51,836 WARN Included extra file "/home/dotcloud/current/supervi sord.conf" during parsing 2012-06-03 10:51:51,938 INFO RPC interface 'supervisor' initialized 2012-06-03 10:51:51,938 WARN cElementTree not installed, using slower XML parser for XML-RPC 2012-06-03 10:51:51,938 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2012-06-03 10:51:51,946 INFO daemonizing the supervisord process 2012-06-03 10:51:51,947 INFO supervisord started with pid 144 2012-06-03 10:51:53,128 INFO spawned: 'celerycam' with pid 159 2012-06-03 10:51:53,133 INFO spawned: 'apnsd' with pid 161 2012-06-03 10:51:53,148 INFO spawned: 'djcelery' with pid 164 2012-06-03 10:51:53,168 INFO spawned: 'uwsgi' with pid 167 2012-06-03 10:51:53,245 INFO exited: djcelery (exit status 1; not expected) 2012-06-03 10:51:53,247 INFO exited: celerycam (exit status 1; not expected) 2012-06-03 10:51:54,698 INFO spawned: 'celerycam' with pid 176 2012-06-03 10:51:54,698 INFO success: apnsd entered RUNNING state, process has s tayed up for > than 1 seconds (startsecs) 2012-06-03 10:51:54,705 INFO spawned: 'djcelery' with pid 177 2012-06-03 10:51:54,706 INFO success: uwsgi entered RUNNING state, process has s tayed up for > than 1 seconds (startsecs) 2012-06-03 10:51:54,731 INFO exited: djcelery (exit status 1; not expected) 2012-06-03 10:51:54,754 INFO exited: celerycam (exit status 1; not expected) 2012-06-03 10:51:56,760 INFO spawned: 'celerycam' with pid 178 2012-06-03 10:51:56,765 INFO spawned: 'djcelery' with pid 179 2012-06-03 10:51:56,790 INFO exited: celerycam (exit status 1; not expected) 2012-06-03 10:51:56,791 INFO exited: djcelery (exit status 1; not expected) 2012-06-03 10:51:59,798 INFO spawned: 'celerycam' with pid 180 2012-06-03 10:52:00,538 INFO spawned: 'djcelery' with pid 181 2012-06-03 10:52:00,565 INFO exited: celerycam (exit status 1; not expected) 2012-06-03 10:52:00,571 INFO gave up: celerycam entered FATAL state, too many st art retries too quickly 2012-06-03 10:52:00,573 INFO exited: djcelery (exit status 1; not expected) 2012-06-03 10:52:01,575 INFO gave up: djcelery entered FATAL state, too many sta rt retries too quickly dotcloud@hack-default-www-0:/var/log/supervisor$
The djerror log:
dotcloud@hack-default-www-0:/var/log/supervisor$ more djcelery_error.log Traceback (most recent call last): File "hack/manage.py", line 2, in from django.core.management import execute_manager ImportError: No module named django.core.management Traceback (most recent call last): File "hack/manage.py", line 2, in from django.core.management import execute_manager ImportError: No module named django.core.management Traceback (most recent call last): File "hack/manage.py", line 2, in from django.core.management import execute_manager ImportError: No module named django.core.management Traceback (most recent call last): File "hack/manage.py", line 2, in from django.core.management import execute_manager ImportError: No module named django.core.management dotcloud@hack-default-www-0:/var/log/supervisor$
The statusctrl shows that some processes are running, but the pids are different. Also, the celery functionality seems to be working ok. Messages are processed, and I can see the messages being processed in the django admin interface (dj celery cam is running).
# supervisorctl status apnsd RUNNING pid 225, uptime 0:00:44 celerycam RUNNING pid 224, uptime 0:00:44 djcelery RUNNING pid 226, uptime 0:00:44
Supervisord.conf file:
[program:djcelery] directory = /home/dotcloud/current/ command = python hack/manage.py celeryd -E -l info -c 2 stderr_logfile = /var/log/supervisor/%(program_name)s_error.log stdout_logfile = /var/log/supervisor/%(program_name)s.log [program:celerycam] directory = /home/dotcloud/current/ command = python hack/manage.py celerycam stderr_logfile = /var/log/supervisor/%(program_name)s_error.log stdout_logfile = /var/log/supervisor/%(program_name)s.log
http://jefurii.cafejosti.net/blog/2011/01/26/celery-in-virtualenv-with-supervisord/ says that the problem may be that the python being used is incorrect, so I've explicitly specified the python in the supervisord file. It now works, but it doesn't explain what I'm seeing above and why I've had to change my configuration when it was working fine last week.
Also, not all of the pids are lining up:
2012-06-03 11:19:03,045 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2012-06-03 11:19:03,051 INFO daemonizing the supervisord process 2012-06-03 11:19:03,052 INFO supervisord started with pid 144 2012-06-03 11:19:04,061 INFO spawned: 'celerycam' with pid 151 2012-06-03 11:19:04,066 INFO spawned: 'apnsd' with pid 153 2012-06-03 11:19:04,085 INFO spawned: 'djcelery' with pid 155 2012-06-03 11:19:04,104 INFO spawned: 'uwsgi' with pid 156 2012-06-03 11:19:05,271 INFO success: celerycam entered RUNNING state, process h as stayed up for > than 1 seconds (startsecs) 2012-06-03 11:19:05,271 INFO success: apnsd entered RUNNING state, process has s tayed up for > than 1 seconds (startsecs) 2012-06-03 11:19:05,271 INFO success: djcelery entered RUNNING state, process ha s stayed up for > than 1 seconds (startsecs) 2012-06-03 11:19:05,271 INFO success: uwsgi entered RUNNING state, process has s tayed up for > than 1 seconds (startsecs)
the status shows that the celery cam pids aren't lining up:
# supervisorctl status apnsd RUNNING pid 153, uptime 0:06:17 celerycam RUNNING pid 150, uptime 0:06:17 djcelery RUNNING pid 155, uptime 0:06:17
My first guess is that your using the wrong python binary (
system
python, instead ofvirtualenv
python), and it is causing this error (below) because thatsystem
python binary doesn't have that package installed.You should change your supervisord.conf to the following to make sure you are pointing to the correct python version.
The python path went fromt
python
to/home/dotcloud/env/bin/python
.I'm not sure why supervisor is saying it is running when it is not, but hopefully this one little change will help clear up your errors, and get everything back to working again.