I have a function that Celery Beat calls it every 5 seconds and it appends something to a global variable. I expect that every 5 seconds my function adds the element to the global variable but it does it every 20 seconds.
Here is tasks.py:
# tasks.py
from celery import shared_task
from .celeryapp import app
from . import cfg
@shared_task
def update_a_global_list():
try:
if cfg.flag:
cfg.init()
l = ['first']
cfg.my_global_var.append(l)
print("my_global_var: " + str(cfg.my_global_var))
except Exception as e:
print(e)
The global variable is in a file called cfg.py:
# cfg.py
global my_global_var
global flag
flag = True
def init():
global flag
flag = False
global my_global_var
my_global_var = []
print('Initialize Step')
The celery configuration of the project is in celeryapp.py:
# celeryapp.py
from __future__ import absolute_import
from celery import Celery
from celery.schedules import crontab
import os
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')
app = Celery('tasks',
broker='amqp://shahab_user:pass1234@localhost:5672/shahab_vhost',
backend='rpc://')
app.conf.beat_schedule = {
'every-5-seconds': {
'task': 'send_requests.tasks.update_a_global_list',
'schedule': 5,
},
}
When I run the command:
celery -A tasks worker -l INFO
in one terminal and run the command:
celery -A send_requests.celeryapp beat -l INFO
in another terminal, I see these logs in the worker terminal:
[2020-10-07 16:39:48,545: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[c24168e6-49df-4188-be2e-8ea05a563f2a]
[2020-10-07 16:39:48,545: WARNING/SpawnPoolWorker-1] Initialize Step <===
[2020-10-07 16:39:48,561: WARNING/SpawnPoolWorker-1] my_global_var: [['first']] <===
[2020-10-07 16:39:48,670: INFO/SpawnPoolWorker-1] Task
send_requests.tasks.update_a_global_list[c24168e6-49df-4188-be2e-8ea05a563f2a] succeeded in 0.125s :None
[2020-10-07 16:39:53,440: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[ce0ec733-be6d-4640-950f-a2f47ecf1693]
[2020-10-07 16:39:53,440: WARNING/SpawnPoolWorker-2] Initialize Step <===
[2020-10-07 16:39:53,440: WARNING/SpawnPoolWorker-2] my_global_var: [['first']] <===
[2020-10-07 16:39:53,547: INFO/SpawnPoolWorker-2] Task
send_requests.tasks.update_a_global_list[ce0ec733-be6d-4640-950f-a2f47ecf1693] succeeded in 0.0940000000409782s: None
[2020-10-07 16:39:58,450: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[fae64a0a-8132-4d88-bcd1-cc59d7e73794]
[2020-10-07 16:39:58,453: WARNING/SpawnPoolWorker-3] Initialize Step <===
[2020-10-07 16:39:58,454: WARNING/SpawnPoolWorker-3] my_global_var: [['first']] <===
[2020-10-07 16:39:58,532: INFO/SpawnPoolWorker-3] Task
send_requests.tasks.update_a_global_list[fae64a0a-8132-4d88-bcd1-cc59d7e73794] succeeded in 0.0779999999795109s: None
[2020-10-07 16:40:03,453: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[86d92236-aaf4-4c62-81d6-2679eed287b2]
[2020-10-07 16:40:03,453: WARNING/SpawnPoolWorker-4] Initialize Step <===
[2020-10-07 16:40:03,453: WARNING/SpawnPoolWorker-4] my_global_var: [['first']] <===
[2020-10-07 16:40:03,533: INFO/SpawnPoolWorker-4] Task
send_requests.tasks.update_a_global_list[86d92236-aaf4-4c62-81d6-2679eed287b2] succeeded in 0.0779999999795109s: None
[2020-10-07 16:40:08,467: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[1c9fb4e4-a364-4079-98fa-f6deeb3ad638]
[2020-10-07 16:40:08,469: WARNING/SpawnPoolWorker-1] my_global_var: [['first'], ['first']] <===
[2020-10-07 16:40:08,472: INFO/SpawnPoolWorker-1] Task
send_requests.tasks.update_a_global_list[1c9fb4e4-a364-4079-98fa-f6deeb3ad638] succeeded in 0.015999999945051968s: None
[2020-10-07 16:40:13,463: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[654d9da4-2c13-46c3-88fe-7b4e73edc8d0]
[2020-10-07 16:40:13,465: WARNING/SpawnPoolWorker-2] my_global_var: [['first'], ['first']] <===
[2020-10-07 16:40:13,468: INFO/SpawnPoolWorker-2] Task
send_requests.tasks.update_a_global_list[654d9da4-2c13-46c3-88fe-7b4e73edc8d0] succeeded in 0.0s:None
[2020-10-07 16:40:18,465: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[8d777ade-870b-4133-9f26-8467e4ca4ba7]
[2020-10-07 16:40:18,467: WARNING/SpawnPoolWorker-3] my_global_var: [['first'], ['first']] <===
[2020-10-07 16:40:18,470: INFO/SpawnPoolWorker-3] Task
send_requests.tasks.update_a_global_list[8d777ade-870b-4133-9f26-8467e4ca4ba7] succeeded in 0.0s:None
[2020-10-07 16:40:23,470: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[dc8bfa49-0f3e-48bc-9c4d-33705048ffce]
[2020-10-07 16:40:23,473: WARNING/SpawnPoolWorker-4] my_global_var: [['first'], ['first']] <===
[2020-10-07 16:40:23,476: INFO/SpawnPoolWorker-4] Task
send_requests.tasks.update_a_global_list[dc8bfa49-0f3e-48bc-9c4d-33705048ffce] succeeded in 0.0s:None
[2020-10-07 16:40:28,465: INFO/MainProcess] Received task:
send_requests.tasks.update_a_global_list[bad53432-03cb-4843-b71b-105c8d83c971]
[2020-10-07 16:40:28,467: WARNING/SpawnPoolWorker-1] my_global_var: [['first'], ['first'], ['first']] <===
[2020-10-07 16:40:28,470: INFO/SpawnPoolWorker-1] Task
send_requests.tasks.update_a_global_list[bad53432-03cb-4843-b71b-105c8d83c971] succeeded in 0.0s:None
Why "Initialize Step" runs more that one time?
Why I have different SpawnPoolWorker
s and they don't show me what I expected?
Thanks for your help.
[EDIT]: According to @DejanLekic answer, I also used Django cache. But I got the same results.
This time I wrote my program in this way:
@shared_task
def update_global_list():
try:
test = []
l = ['first']
cached_object = cache.get('my_global_var')
if cached_object is None:
cache.set('my_global_var', test)
cached_object = cache.get('my_global_var')
cached_object.append(l)
cache.set('my_global_var', cached_object)
print("my_global_var : " + str(cache.get('my_global_var')))
except Exception as e:
print(e)
And I got these results:
[2020-10-09 14:11:45,903: INFO/MainProcess] Received task:
send_requests.tasks.update_global_list[93264e6e-3cd0-4068-ac9e-98f366cdfd51]
[2020-10-09 14:11:45,907: WARNING/SpawnPoolWorker-1] my_global_var :
[['first']]
[2020-10-09 14:11:45,963: INFO/SpawnPoolWorker-1] Task
send_requests.tasks.update_global_list[93264e6e-3cd0-4068-ac9e-98f366cdfd51]
succeeded in 0.0470000000204891s: None
[2020-10-09 14:11:50,835: INFO/MainProcess] Received task:
send_requests.tasks.update_global_list[0673a0a8-8b79-4325-a9aa-015061f76166]
[2020-10-09 14:11:50,838: WARNING/SpawnPoolWorker-2] my_global_var :
[['first']]
[2020-10-09 14:11:50,887: INFO/SpawnPoolWorker-2] Task
send_requests.tasks.update_global_list[0673a0a8-8b79-4325-a9aa-015061f76166]
succeeded in 0.0470000000204891s: None
[2020-10-09 14:11:55,830: INFO/MainProcess] Received task:
send_requests.tasks.update_global_list[2c58b9b3-ae37-4b8a-b28f-8e4dce2952aa]
[2020-10-09 14:11:55,834: WARNING/SpawnPoolWorker-3] my_global_var :
[['first']]
[2020-10-09 14:11:55,884: INFO/SpawnPoolWorker-3] Task
send_requests.tasks.update_global_list[2c58b9b3-ae37-4b8a-b28f-8e4dce2952aa]
succeeded in 0.0470000000204891s: None
[2020-10-09 14:12:00,829: INFO/MainProcess] Received task:
send_requests.tasks.update_global_list[2fc7a1a9-27e7-41dc-936d-a017d2a283bc]
[2020-10-09 14:12:00,833: WARNING/SpawnPoolWorker-4] my_global_var :
[['first']]
[2020-10-09 14:12:00,951: INFO/SpawnPoolWorker-4] Task
send_requests.tasks.update_global_list[2fc7a1a9-27e7-41dc-936d-a017d2a283bc]
succeeded in 0.125s:None
[2020-10-09 14:12:05,838: INFO/MainProcess] Received task:
send_requests.tasks.update_global_list[d382a795-8041-4331-a9ca-d4d74b2c8982]
[2020-10-09 14:12:05,840: WARNING/SpawnPoolWorker-1] my_global_var :
[['first'], ['first']]
[2020-10-09 14:12:05,842: INFO/SpawnPoolWorker-1] Task
send_requests.tasks.update_global_list[d382a795-8041-4331-a9ca-d4d74b2c8982]
succeeded in 0.015999999945051968s: None
Looks like different worker processes are not in sync with each other. I'm totally confused. How can I sync them?
Using global vars in a distributed environment is just inviting trouble... It may work if you use a single Celery worker and threads as concurrency type. - Typical solution for this is to use caching server (Redis, memcached or similar).
Why? - All the worker processes will have own version of the
my_global_var
so when task that adds something to it runs, it will modify my_global_var inside that worker process...