Queue worker freezes unexpectedly and requires restart work work again

46 Views Asked by At

I have 3 beanstalkd queues and each have its own worker. The workers are written in php and I am not using any framework. To work with the queues I am using pheanstalk to work with the queue.

For some reason the workers suddenly stop working without any error and do not process the jobs in the queue. Once I restart the workers they start working for the next couple of days and the same cycle repeats. Can some one please help me understand on what could be the issue?

My worker

<?php

ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);

require 'vendor/autoload.php';

use Pheanstalk\Pheanstalk;


function function_name()
{
  $log_file = './log/error.log';
  $tube_name = 'current_tube_name';
  $memoryLimit = 128;
  $dotenv = Dotenv\Dotenv::createImmutable(__DIR__);
  $dotenv->load();
  $beanstalkd_host = $_ENV['BEANSTALKD_HOST'];
  $beanstalkd_port = $_ENV['BEANSTALKD_PORT'];

  while (true) {
    try {
        $pheanstalk = Pheanstalk::create($beanstalkd_host);
        $pheanstalk->watch($tube_name)->ignore('default');
    }
    catch(\Exception $exception) {
        $error_message = "CRITICAL|" . __FILE__ . "|" . __LINE__ . "|" . "Failed to initiate queue worker; could not connect to beanstalkd.Error {$exception->getMessage()}";
        error_log($error_message, 3, $log_file);
        exit;
    }

    $job = $pheanstalk->reserve(10);
    if (!$job) {
        sleep(5);
        continue; //move on to next iteration
    }

    $job_in_queue = $job->getData();//outputting the message
    $job_pay_load = json_decode($job_in_queue, true);
    if(is_array($job_pay_load) == false) {
        $job_pay_load = json_decode(unserialize($job_in_queue), true);
    }

    if (is_null($job_pay_load) || !is_array($job_pay_load) || count($job_pay_load) === 0 || !is_countable($job_pay_load)) {
        $error_message = "CRITICAL|" . __FILE__ . "|" . __LINE__ . "|" . "Job payload structure is not correct. Data: " . print_r($jobPayload, true);
        error_log($error_message, 3, $log_file);
        $pheanstalk->bury($job);
        $job = null;
        $job_pay_load = null;
        continue;
    }

    //do the actual process

    $payload = array(
        'key1' => $val1,
        'key2' => $val2
    );

    $next_tube_name = 'second_tube';
    $json_payload = json_encode($payload);
    $pheanstalk
        ->useTube($next_tube_name)
        ->put($json_payload);

    $pheanstalk->delete($job);

    $job = null;
    $json_payload = null;
    $job_pay_load = [];
    $job_in_queue = null;

    gc_collect_cycles();

    if((memory_get_usage() / 1024 / 1024) >= $memoryLimit) {
        $error_message = "We ran out of memory, restarting... in file ".__FILE__." on line" . __LINE__;
        error_log($error_message, 3, $log_file);
        exit;
    }
  }
 }

 function_name();
1

There are 1 best solutions below

1
Alister Bulman On

Beyond looking at the logs and seeing how the script itself is run, or run after it has exited - one thing does come to mind.

You do have a memory-use check, but the memory_get_usage function doesn't tell the whole story - there is an optional parameter, bool $real_usage which can be set to true. This gets the total amount that has been allocated - as opposed to what is being currently used by PHP.

Since you are checking in this code for memory-usage > 128MB, I'd think that you also have a default PHP setting of memory_limit also set at 128M - but if the amount of memory the script is using is already past that amount, it would have likely already failed and exited. Depending on the amount of work you are actually doing, I'd reduce that $memoryLimit = 128; down to say 3/4s of the amount - say no more than 92 (megabytes).

Putting in a max-jobs count of how many loops are allowed before it will restart itself would be another way to avoid any potential memory leaks from bringing things down.

Another potential cause is failing to handle a connection issue with the daemon itself - if the reserve, or other calls via the Pheanstalk library to Beanstalkd failed.

Ultimately, you have to make sure that the script is restarted automatically if you do deliberately - or accidentally - exit when you don't want to actually stop. How you do that depends on how you start the script running in the first place - and also the version of the OS you are running (as that dictates what tools may be available, such as an systemd unit script).

Additional (temporary) logging of (almost) every action and the wider system status would help narrow down where things may be going wrong. Info such as the memory used, number of jobs already processed and how busy the wider system is can all provide useful information.