Dotcloud supervisord shows error but process is running

3.4k Views Asked by At

My dotcloud setup (django-celery with rabbitmq) was working fine a week ago - the processes were starting up ok and the logs were clean. However, I recently repushed (without updating any of the code), and now the logs are saying that the processes fail to start even though they seem to be running.

Supervisord log

dotcloud@hack-default-www-0:/var/log/supervisor$ more supervisord.log
2012-06-03 10:51:51,836 CRIT Set uid to user 1000
2012-06-03 10:51:51,836 WARN Included extra file "/etc/supervisor/conf.d/uwsgi.c
onf" during parsing
2012-06-03 10:51:51,836 WARN Included extra file "/home/dotcloud/current/supervi
sord.conf" during parsing
2012-06-03 10:51:51,938 INFO RPC interface 'supervisor' initialized
2012-06-03 10:51:51,938 WARN cElementTree not installed, using slower XML parser
 for XML-RPC
2012-06-03 10:51:51,938 CRIT Server 'unix_http_server' running without any HTTP
authentication checking
2012-06-03 10:51:51,946 INFO daemonizing the supervisord process
2012-06-03 10:51:51,947 INFO supervisord started with pid 144
2012-06-03 10:51:53,128 INFO spawned: 'celerycam' with pid 159
2012-06-03 10:51:53,133 INFO spawned: 'apnsd' with pid 161
2012-06-03 10:51:53,148 INFO spawned: 'djcelery' with pid 164
2012-06-03 10:51:53,168 INFO spawned: 'uwsgi' with pid 167
2012-06-03 10:51:53,245 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:53,247 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:54,698 INFO spawned: 'celerycam' with pid 176
2012-06-03 10:51:54,698 INFO success: apnsd entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 10:51:54,705 INFO spawned: 'djcelery' with pid 177
2012-06-03 10:51:54,706 INFO success: uwsgi entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 10:51:54,731 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:54,754 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:56,760 INFO spawned: 'celerycam' with pid 178
2012-06-03 10:51:56,765 INFO spawned: 'djcelery' with pid 179
2012-06-03 10:51:56,790 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:56,791 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:59,798 INFO spawned: 'celerycam' with pid 180
2012-06-03 10:52:00,538 INFO spawned: 'djcelery' with pid 181
2012-06-03 10:52:00,565 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:52:00,571 INFO gave up: celerycam entered FATAL state, too many st
art retries too quickly
2012-06-03 10:52:00,573 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:52:01,575 INFO gave up: djcelery entered FATAL state, too many sta
rt retries too quickly
dotcloud@hack-default-www-0:/var/log/supervisor$

The djerror log:

dotcloud@hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
Traceback (most recent call last):
  File "hack/manage.py", line 2, in 
    from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
  File "hack/manage.py", line 2, in 
    from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
  File "hack/manage.py", line 2, in 
    from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
  File "hack/manage.py", line 2, in 
    from django.core.management import execute_manager
ImportError: No module named django.core.management
dotcloud@hack-default-www-0:/var/log/supervisor$

The statusctrl shows that some processes are running, but the pids are different. Also, the celery functionality seems to be working ok. Messages are processed, and I can see the messages being processed in the django admin interface (dj celery cam is running).

# supervisorctl status
apnsd                            RUNNING    pid 225, uptime 0:00:44
celerycam                        RUNNING    pid 224, uptime 0:00:44
djcelery                         RUNNING    pid 226, uptime 0:00:44

Supervisord.conf file:

[program:djcelery]
directory = /home/dotcloud/current/
command = python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

[program:celerycam]
directory = /home/dotcloud/current/
command = python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

http://jefurii.cafejosti.net/blog/2011/01/26/celery-in-virtualenv-with-supervisord/ says that the problem may be that the python being used is incorrect, so I've explicitly specified the python in the supervisord file. It now works, but it doesn't explain what I'm seeing above and why I've had to change my configuration when it was working fine last week.

Also, not all of the pids are lining up:

2012-06-03 11:19:03,045 CRIT Server 'unix_http_server' running without any HTTP
authentication checking
2012-06-03 11:19:03,051 INFO daemonizing the supervisord process
2012-06-03 11:19:03,052 INFO supervisord started with pid 144
2012-06-03 11:19:04,061 INFO spawned: 'celerycam' with pid 151
2012-06-03 11:19:04,066 INFO spawned: 'apnsd' with pid 153
2012-06-03 11:19:04,085 INFO spawned: 'djcelery' with pid 155
2012-06-03 11:19:04,104 INFO spawned: 'uwsgi' with pid 156
2012-06-03 11:19:05,271 INFO success: celerycam entered RUNNING state, process h
as stayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: apnsd entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: djcelery entered RUNNING state, process ha
s stayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: uwsgi entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)

the status shows that the celery cam pids aren't lining up:

# supervisorctl status
apnsd                            RUNNING    pid 153, uptime 0:06:17
celerycam                        RUNNING    pid 150, uptime 0:06:17
djcelery                         RUNNING    pid 155, uptime 0:06:17
1

There are 1 best solutions below

4
On BEST ANSWER

My first guess is that your using the wrong python binary (system python, instead of virtualenv python), and it is causing this error (below) because that system python binary doesn't have that package installed.

  Traceback (most recent call last):
  File "hack/manage.py", line 2, in 
    from django.core.management import execute_manager
  ImportError: No module named django.core.management

You should change your supervisord.conf to the following to make sure you are pointing to the correct python version.

[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

The python path went fromt python to /home/dotcloud/env/bin/python.

I'm not sure why supervisor is saying it is running when it is not, but hopefully this one little change will help clear up your errors, and get everything back to working again.