Nodejs cluster not taking concurrent calls correctly

203 Views Asked by At

[Edit: I added some profiling logs and process exit traces to the code, and pasted the results at the end of the post]

I can't get Nodejs cluster to properly make use of the server's available CPUs while using port sharing. The problem is observed on two different servers (personal computer and remote VPS), as indicated below, and it has been reproduced by a commenter (see comments from jfriend00).

Node Version: v10.13.0; Windows 7 64 bits; 8 cores

Node Version: v8.1.3; CentOS 6.9; 4 cores

The bench is as follows: a worker is created for each available CPU, listens on port 8000 (using Express) and replies after a simulated processing time of 2 seconds (the process is supposed totally synced for the benching, with no async operation).

cluster_bench.js

var cluster = require('cluster');
var express = require('express');

if(cluster.isMaster) {
  var numWorkers = require('os').cpus().length; 
  console.log('[' + new Date().toISOString() + '] Master cluster setting up ' + numWorkers + ' workers...');

  for(var i = 0; i < numWorkers; i++) {
    cluster.fork();
  }

  cluster.on('online', function(worker) {
    console.log('[' + new Date().toISOString() + '] Worker ' + (''+worker.process.pid).padStart(6, ' ') + ' is online');
  });

  cluster.on('exit', function(worker, code, signal) {
    console.log('[' + new Date().toISOString() + '] Worker ' + (''+worker.process.pid).padStart(6, ' ') + ' died with code: ' + code + ', and signal: ' + signal);
    console.log('[' + new Date().toISOString() + '] Starting a new worker');
    cluster.fork();
  });
} else {
  var app = express();
  const pid = (''+process.pid).padStart(6, ' ');
  app.all('/*', function(req, res) {
    const requestNum = req.query.num;
    console.log('[' + new Date().toISOString() + '] Process ' + pid + ' handles query #' + requestNum);
    // simulates a 10s-long process
    const now = new Date().getTime();
    let waittime = 2000; // 2 seconds
    while (new Date().getTime() < now + waittime) { /* simulated synced processing */ };
    console.log('[' + new Date().toISOString() + '] Process ' + pid + ' sends answer to query #' + requestNum);
    res.send('Process ' + pid + ' says hello!');
  });
  var port = 8000;
  var server = app.listen(port, function() {
    console.log('[' + new Date().toISOString() + '] Process ' + pid + ' is listening on port ' + port);
  });
}

Now I bench this script with 32 concurrent requests:

cluster_client.js

const http = require('http');
// Send 32 concurrent calls
for(var i=0; i<32; i++) {
  let id = (''+i).padStart(2, ' ');
  console.log('[' + new Date().toISOString() + '] Request ' + id + ' launched.');
  http.get('http://localhost:8000?num=' + id, (resp) => {
    let data = '';
    resp.on('data', chunk => data += chunk );
    resp.on('end', () => console.log('[' + new Date().toISOString() + '] Request ' + id + ' - answer: ' + data) );
  });
}

I would expect my calls to be processed by batches of n-CPUs.

Yet, I observe this behavior:

  • first batch actually makes use of the n CPUs
  • later batches make use of less and less CPUs
  • last batches only make use of only one CPU, so the last requests are answered each after the other, in a very long time

For example, on a 8-core computer: first 8 requests processed at once, then another 8 at once, then 4 at once, 4 at once, 2 at once, 2 at once, 2 at once, 1 at once, 1 at once.

On Windows, if I launch my client script a second time without restarting the server, then only 2 of the existing workers will handle the 32 requests. On CentOS, the behavior is the same as the first launch (first using all CPUs, then less and less).

No 'exit' event is produced by any workers. I also placed hooks with logs on the child processes events (all events described in https://nodejs.org/api/process.html#process_process_events), and observed that no event of any kind was emitted for any of the child processes. Also, jfriend00 has checked the actual sending of all the requests by the client with WireShark (see comment), which are correctly sent.

How can this behavior be?


Here's the traces I get from the server:

[2019-08-30T20:57:42.078Z] Master cluster setting up 8 workers...
[2019-08-30T20:57:42.781Z] Worker   1732 is online
[2019-08-30T20:57:42.798Z] Worker    208 is online
[2019-08-30T20:57:42.866Z] Worker  10868 is online
[2019-08-30T20:57:43.032Z] Worker  10232 is online
[2019-08-30T20:57:43.044Z] Worker  12356 is online
[2019-08-30T20:57:43.113Z] Worker  12500 is online
[2019-08-30T20:57:43.208Z] Worker   1196 is online
[2019-08-30T20:57:43.315Z] Worker  10948 is online
[2019-08-30T20:57:43.738Z] Process   1732 is listening on port 8000
[2019-08-30T20:57:43.779Z] Process    208 is listening on port 8000
[2019-08-30T20:57:43.891Z] Process  10868 is listening on port 8000
[2019-08-30T20:57:44.018Z] Process  10232 is listening on port 8000
[2019-08-30T20:57:44.018Z] Process  12356 is listening on port 8000
[2019-08-30T20:57:44.039Z] Process  12500 is listening on port 8000
[2019-08-30T20:57:44.097Z] Process   1196 is listening on port 8000
[2019-08-30T20:57:44.159Z] Process  10948 is listening on port 8000
[2019-08-30T20:57:49.570Z] Process  10232 handles query # 5
[2019-08-30T20:57:49.571Z] Process  10948 handles query # 0
[2019-08-30T20:57:49.596Z] Process   1732 handles query #10
[2019-08-30T20:57:49.607Z] Process  10868 handles query # 7
[2019-08-30T20:57:49.610Z] Process   1196 handles query # 2
[2019-08-30T20:57:49.611Z] Process    208 handles query # 8
[2019-08-30T20:57:49.613Z] Process  12356 handles query # 6
[2019-08-30T20:57:49.628Z] Process  12500 handles query # 4
[2019-08-30T20:57:51.586Z] Process  10232 sends answer to query # 5
[2019-08-30T20:57:51.613Z] Process  10948 sends answer to query # 0
[2019-08-30T20:57:51.620Z] Process   1732 sends answer to query #10
[2019-08-30T20:57:51.655Z] Process  10868 sends answer to query # 7
[2019-08-30T20:57:51.656Z] Process   1196 sends answer to query # 2
[2019-08-30T20:57:51.666Z] Process  10232 handles query #14
[2019-08-30T20:57:51.684Z] Process    208 sends answer to query # 8
[2019-08-30T20:57:51.693Z] Process   1732 handles query #31
[2019-08-30T20:57:51.695Z] Process  10948 handles query # 1
[2019-08-30T20:57:51.734Z] Process  12356 sends answer to query # 6
[2019-08-30T20:57:51.755Z] Process  12500 sends answer to query # 4
[2019-08-30T20:57:51.770Z] Process   1196 handles query # 3
[2019-08-30T20:57:51.774Z] Process  10868 handles query #12
[2019-08-30T20:57:51.815Z] Process    208 handles query #17
[2019-08-30T20:57:51.841Z] Process  12356 handles query # 9
[2019-08-30T20:57:51.885Z] Process  12500 handles query #13
[2019-08-30T20:57:53.671Z] Process  10232 sends answer to query #14
[2019-08-30T20:57:53.697Z] Process   1732 sends answer to query #31
[2019-08-30T20:57:53.699Z] Process  10948 sends answer to query # 1
[2019-08-30T20:57:53.709Z] Process  10948 handles query #11
[2019-08-30T20:57:53.771Z] Process   1196 sends answer to query # 3
[2019-08-30T20:57:53.781Z] Process  10868 sends answer to query #12
[2019-08-30T20:57:53.780Z] Process   1196 handles query #15
[2019-08-30T20:57:53.816Z] Process    208 sends answer to query #17
[2019-08-30T20:57:53.848Z] Process  12356 sends answer to query # 9
[2019-08-30T20:57:53.856Z] Process  12356 handles query #16
[2019-08-30T20:57:53.886Z] Process  12500 sends answer to query #13
[2019-08-30T20:57:53.895Z] Process  12500 handles query #19
[2019-08-30T20:57:55.712Z] Process  10948 sends answer to query #11
[2019-08-30T20:57:55.718Z] Process  10948 handles query #18
[2019-08-30T20:57:55.782Z] Process   1196 sends answer to query #15
[2019-08-30T20:57:55.788Z] Process   1196 handles query #21
[2019-08-30T20:57:55.856Z] Process  12356 sends answer to query #16
[2019-08-30T20:57:55.862Z] Process  12356 handles query #20
[2019-08-30T20:57:55.896Z] Process  12500 sends answer to query #19
[2019-08-30T20:57:55.901Z] Process  12500 handles query #23
[2019-08-30T20:57:57.719Z] Process  10948 sends answer to query #18
[2019-08-30T20:57:57.789Z] Process   1196 sends answer to query #21
[2019-08-30T20:57:57.863Z] Process  12356 sends answer to query #20
[2019-08-30T20:57:57.889Z] Process  12356 handles query #22
[2019-08-30T20:57:57.901Z] Process  12500 sends answer to query #23
[2019-08-30T20:57:57.919Z] Process  12500 handles query #25
[2019-08-30T20:57:59.889Z] Process  12356 sends answer to query #22
[2019-08-30T20:57:59.891Z] Process  12356 handles query #24
[2019-08-30T20:57:59.920Z] Process  12500 sends answer to query #25
[2019-08-30T20:57:59.922Z] Process  12500 handles query #27
[2019-08-30T20:58:01.892Z] Process  12356 sends answer to query #24
[2019-08-30T20:58:01.894Z] Process  12356 handles query #26
[2019-08-30T20:58:01.923Z] Process  12500 sends answer to query #27
[2019-08-30T20:58:01.926Z] Process  12500 handles query #29
[2019-08-30T20:58:03.895Z] Process  12356 sends answer to query #26
[2019-08-30T20:58:03.899Z] Process  12356 handles query #28
[2019-08-30T20:58:03.926Z] Process  12500 sends answer to query #29
[2019-08-30T20:58:05.901Z] Process  12356 sends answer to query #28
[2019-08-30T20:58:05.903Z] Process  12356 handles query #30
[2019-08-30T20:58:07.904Z] Process  12356 sends answer to query #30

and from the client side:

[2019-08-30T20:57:49.293Z] Request  0 launched.
[2019-08-30T20:57:49.376Z] Request  1 launched.
[2019-08-30T20:57:49.380Z] Request  2 launched.
[2019-08-30T20:57:49.381Z] Request  3 launched.
[2019-08-30T20:57:49.384Z] Request  4 launched.
[2019-08-30T20:57:49.387Z] Request  5 launched.
[2019-08-30T20:57:49.390Z] Request  6 launched.
[2019-08-30T20:57:49.393Z] Request  7 launched.
[2019-08-30T20:57:49.396Z] Request  8 launched.
[2019-08-30T20:57:49.404Z] Request  9 launched.
[2019-08-30T20:57:49.407Z] Request 10 launched.
[2019-08-30T20:57:49.410Z] Request 11 launched.
[2019-08-30T20:57:49.413Z] Request 12 launched.
[2019-08-30T20:57:49.421Z] Request 13 launched.
[2019-08-30T20:57:49.425Z] Request 14 launched.
[2019-08-30T20:57:49.427Z] Request 15 launched.
[2019-08-30T20:57:49.435Z] Request 16 launched.
[2019-08-30T20:57:49.438Z] Request 17 launched.
[2019-08-30T20:57:49.441Z] Request 18 launched.
[2019-08-30T20:57:49.444Z] Request 19 launched.
[2019-08-30T20:57:49.452Z] Request 20 launched.
[2019-08-30T20:57:49.455Z] Request 21 launched.
[2019-08-30T20:57:49.458Z] Request 22 launched.
[2019-08-30T20:57:49.461Z] Request 23 launched.
[2019-08-30T20:57:49.469Z] Request 24 launched.
[2019-08-30T20:57:49.472Z] Request 25 launched.
[2019-08-30T20:57:49.475Z] Request 26 launched.
[2019-08-30T20:57:49.479Z] Request 27 launched.
[2019-08-30T20:57:49.487Z] Request 28 launched.
[2019-08-30T20:57:49.489Z] Request 29 launched.
[2019-08-30T20:57:49.493Z] Request 30 launched.
[2019-08-30T20:57:49.496Z] Request 31 launched.
[2019-08-30T20:57:51.637Z] Request  5 - answer: Process  10232 says hello!
[2019-08-30T20:57:51.651Z] Request  0 - answer: Process  10948 says hello!
[2019-08-30T20:57:51.654Z] Request 10 - answer: Process   1732 says hello!
[2019-08-30T20:57:51.704Z] Request  2 - answer: Process   1196 says hello!
[2019-08-30T20:57:51.727Z] Request  7 - answer: Process  10868 says hello!
[2019-08-30T20:57:51.763Z] Request  8 - answer: Process    208 says hello!
[2019-08-30T20:57:51.786Z] Request  6 - answer: Process  12356 says hello!
[2019-08-30T20:57:51.801Z] Request  4 - answer: Process  12500 says hello!
[2019-08-30T20:57:53.677Z] Request 14 - answer: Process  10232 says hello!
[2019-08-30T20:57:53.702Z] Request 31 - answer: Process   1732 says hello!
[2019-08-30T20:57:53.705Z] Request  1 - answer: Process  10948 says hello!
[2019-08-30T20:57:53.777Z] Request  3 - answer: Process   1196 says hello!
[2019-08-30T20:57:53.786Z] Request 12 - answer: Process  10868 says hello!
[2019-08-30T20:57:53.821Z] Request 17 - answer: Process    208 says hello!
[2019-08-30T20:57:53.853Z] Request  9 - answer: Process  12356 says hello!
[2019-08-30T20:57:53.891Z] Request 13 - answer: Process  12500 says hello!
[2019-08-30T20:57:55.715Z] Request 11 - answer: Process  10948 says hello!
[2019-08-30T20:57:55.786Z] Request 15 - answer: Process   1196 says hello!
[2019-08-30T20:57:55.860Z] Request 16 - answer: Process  12356 says hello!
[2019-08-30T20:57:55.898Z] Request 19 - answer: Process  12500 says hello!
[2019-08-30T20:57:57.721Z] Request 18 - answer: Process  10948 says hello!
[2019-08-30T20:57:57.792Z] Request 21 - answer: Process   1196 says hello!
[2019-08-30T20:57:57.866Z] Request 20 - answer: Process  12356 says hello!
[2019-08-30T20:57:57.903Z] Request 23 - answer: Process  12500 says hello!
[2019-08-30T20:57:59.891Z] Request 22 - answer: Process  12356 says hello!
[2019-08-30T20:57:59.922Z] Request 25 - answer: Process  12500 says hello!
[2019-08-30T20:58:01.894Z] Request 24 - answer: Process  12356 says hello!
[2019-08-30T20:58:01.925Z] Request 27 - answer: Process  12500 says hello!
[2019-08-30T20:58:03.897Z] Request 26 - answer: Process  12356 says hello!
[2019-08-30T20:58:03.928Z] Request 29 - answer: Process  12500 says hello!
[2019-08-30T20:58:05.903Z] Request 28 - answer: Process  12356 says hello!
[2019-08-30T20:58:07.906Z] Request 30 - answer: Process  12356 says hello!

As indicated, fewer and fewer queries are handled at once as the time passes. In the end, only Process #12356 still does something. Traces show no exiting from any process.

0

There are 0 best solutions below