[Edit: I added some profiling logs and process exit traces to the code, and pasted the results at the end of the post]
I can't get Nodejs cluster to properly make use of the server's available CPUs while using port sharing. The problem is observed on two different servers (personal computer and remote VPS), as indicated below, and it has been reproduced by a commenter (see comments from jfriend00).
Node Version: v10.13.0; Windows 7 64 bits; 8 cores
Node Version: v8.1.3; CentOS 6.9; 4 cores
The bench is as follows: a worker is created for each available CPU, listens on port 8000 (using Express) and replies after a simulated processing time of 2 seconds (the process is supposed totally synced for the benching, with no async operation).
cluster_bench.js
var cluster = require('cluster');
var express = require('express');
if(cluster.isMaster) {
var numWorkers = require('os').cpus().length;
console.log('[' + new Date().toISOString() + '] Master cluster setting up ' + numWorkers + ' workers...');
for(var i = 0; i < numWorkers; i++) {
cluster.fork();
}
cluster.on('online', function(worker) {
console.log('[' + new Date().toISOString() + '] Worker ' + (''+worker.process.pid).padStart(6, ' ') + ' is online');
});
cluster.on('exit', function(worker, code, signal) {
console.log('[' + new Date().toISOString() + '] Worker ' + (''+worker.process.pid).padStart(6, ' ') + ' died with code: ' + code + ', and signal: ' + signal);
console.log('[' + new Date().toISOString() + '] Starting a new worker');
cluster.fork();
});
} else {
var app = express();
const pid = (''+process.pid).padStart(6, ' ');
app.all('/*', function(req, res) {
const requestNum = req.query.num;
console.log('[' + new Date().toISOString() + '] Process ' + pid + ' handles query #' + requestNum);
// simulates a 10s-long process
const now = new Date().getTime();
let waittime = 2000; // 2 seconds
while (new Date().getTime() < now + waittime) { /* simulated synced processing */ };
console.log('[' + new Date().toISOString() + '] Process ' + pid + ' sends answer to query #' + requestNum);
res.send('Process ' + pid + ' says hello!');
});
var port = 8000;
var server = app.listen(port, function() {
console.log('[' + new Date().toISOString() + '] Process ' + pid + ' is listening on port ' + port);
});
}
Now I bench this script with 32 concurrent requests:
cluster_client.js
const http = require('http');
// Send 32 concurrent calls
for(var i=0; i<32; i++) {
let id = (''+i).padStart(2, ' ');
console.log('[' + new Date().toISOString() + '] Request ' + id + ' launched.');
http.get('http://localhost:8000?num=' + id, (resp) => {
let data = '';
resp.on('data', chunk => data += chunk );
resp.on('end', () => console.log('[' + new Date().toISOString() + '] Request ' + id + ' - answer: ' + data) );
});
}
I would expect my calls to be processed by batches of n-CPUs.
Yet, I observe this behavior:
- first batch actually makes use of the n CPUs
- later batches make use of less and less CPUs
- last batches only make use of only one CPU, so the last requests are answered each after the other, in a very long time
For example, on a 8-core computer: first 8 requests processed at once, then another 8 at once, then 4 at once, 4 at once, 2 at once, 2 at once, 2 at once, 1 at once, 1 at once.
On Windows, if I launch my client script a second time without restarting the server, then only 2 of the existing workers will handle the 32 requests. On CentOS, the behavior is the same as the first launch (first using all CPUs, then less and less).
No 'exit' event is produced by any workers. I also placed hooks with logs on the child processes events (all events described in https://nodejs.org/api/process.html#process_process_events), and observed that no event of any kind was emitted for any of the child processes. Also, jfriend00 has checked the actual sending of all the requests by the client with WireShark (see comment), which are correctly sent.
How can this behavior be?
Here's the traces I get from the server:
[2019-08-30T20:57:42.078Z] Master cluster setting up 8 workers...
[2019-08-30T20:57:42.781Z] Worker 1732 is online
[2019-08-30T20:57:42.798Z] Worker 208 is online
[2019-08-30T20:57:42.866Z] Worker 10868 is online
[2019-08-30T20:57:43.032Z] Worker 10232 is online
[2019-08-30T20:57:43.044Z] Worker 12356 is online
[2019-08-30T20:57:43.113Z] Worker 12500 is online
[2019-08-30T20:57:43.208Z] Worker 1196 is online
[2019-08-30T20:57:43.315Z] Worker 10948 is online
[2019-08-30T20:57:43.738Z] Process 1732 is listening on port 8000
[2019-08-30T20:57:43.779Z] Process 208 is listening on port 8000
[2019-08-30T20:57:43.891Z] Process 10868 is listening on port 8000
[2019-08-30T20:57:44.018Z] Process 10232 is listening on port 8000
[2019-08-30T20:57:44.018Z] Process 12356 is listening on port 8000
[2019-08-30T20:57:44.039Z] Process 12500 is listening on port 8000
[2019-08-30T20:57:44.097Z] Process 1196 is listening on port 8000
[2019-08-30T20:57:44.159Z] Process 10948 is listening on port 8000
[2019-08-30T20:57:49.570Z] Process 10232 handles query # 5
[2019-08-30T20:57:49.571Z] Process 10948 handles query # 0
[2019-08-30T20:57:49.596Z] Process 1732 handles query #10
[2019-08-30T20:57:49.607Z] Process 10868 handles query # 7
[2019-08-30T20:57:49.610Z] Process 1196 handles query # 2
[2019-08-30T20:57:49.611Z] Process 208 handles query # 8
[2019-08-30T20:57:49.613Z] Process 12356 handles query # 6
[2019-08-30T20:57:49.628Z] Process 12500 handles query # 4
[2019-08-30T20:57:51.586Z] Process 10232 sends answer to query # 5
[2019-08-30T20:57:51.613Z] Process 10948 sends answer to query # 0
[2019-08-30T20:57:51.620Z] Process 1732 sends answer to query #10
[2019-08-30T20:57:51.655Z] Process 10868 sends answer to query # 7
[2019-08-30T20:57:51.656Z] Process 1196 sends answer to query # 2
[2019-08-30T20:57:51.666Z] Process 10232 handles query #14
[2019-08-30T20:57:51.684Z] Process 208 sends answer to query # 8
[2019-08-30T20:57:51.693Z] Process 1732 handles query #31
[2019-08-30T20:57:51.695Z] Process 10948 handles query # 1
[2019-08-30T20:57:51.734Z] Process 12356 sends answer to query # 6
[2019-08-30T20:57:51.755Z] Process 12500 sends answer to query # 4
[2019-08-30T20:57:51.770Z] Process 1196 handles query # 3
[2019-08-30T20:57:51.774Z] Process 10868 handles query #12
[2019-08-30T20:57:51.815Z] Process 208 handles query #17
[2019-08-30T20:57:51.841Z] Process 12356 handles query # 9
[2019-08-30T20:57:51.885Z] Process 12500 handles query #13
[2019-08-30T20:57:53.671Z] Process 10232 sends answer to query #14
[2019-08-30T20:57:53.697Z] Process 1732 sends answer to query #31
[2019-08-30T20:57:53.699Z] Process 10948 sends answer to query # 1
[2019-08-30T20:57:53.709Z] Process 10948 handles query #11
[2019-08-30T20:57:53.771Z] Process 1196 sends answer to query # 3
[2019-08-30T20:57:53.781Z] Process 10868 sends answer to query #12
[2019-08-30T20:57:53.780Z] Process 1196 handles query #15
[2019-08-30T20:57:53.816Z] Process 208 sends answer to query #17
[2019-08-30T20:57:53.848Z] Process 12356 sends answer to query # 9
[2019-08-30T20:57:53.856Z] Process 12356 handles query #16
[2019-08-30T20:57:53.886Z] Process 12500 sends answer to query #13
[2019-08-30T20:57:53.895Z] Process 12500 handles query #19
[2019-08-30T20:57:55.712Z] Process 10948 sends answer to query #11
[2019-08-30T20:57:55.718Z] Process 10948 handles query #18
[2019-08-30T20:57:55.782Z] Process 1196 sends answer to query #15
[2019-08-30T20:57:55.788Z] Process 1196 handles query #21
[2019-08-30T20:57:55.856Z] Process 12356 sends answer to query #16
[2019-08-30T20:57:55.862Z] Process 12356 handles query #20
[2019-08-30T20:57:55.896Z] Process 12500 sends answer to query #19
[2019-08-30T20:57:55.901Z] Process 12500 handles query #23
[2019-08-30T20:57:57.719Z] Process 10948 sends answer to query #18
[2019-08-30T20:57:57.789Z] Process 1196 sends answer to query #21
[2019-08-30T20:57:57.863Z] Process 12356 sends answer to query #20
[2019-08-30T20:57:57.889Z] Process 12356 handles query #22
[2019-08-30T20:57:57.901Z] Process 12500 sends answer to query #23
[2019-08-30T20:57:57.919Z] Process 12500 handles query #25
[2019-08-30T20:57:59.889Z] Process 12356 sends answer to query #22
[2019-08-30T20:57:59.891Z] Process 12356 handles query #24
[2019-08-30T20:57:59.920Z] Process 12500 sends answer to query #25
[2019-08-30T20:57:59.922Z] Process 12500 handles query #27
[2019-08-30T20:58:01.892Z] Process 12356 sends answer to query #24
[2019-08-30T20:58:01.894Z] Process 12356 handles query #26
[2019-08-30T20:58:01.923Z] Process 12500 sends answer to query #27
[2019-08-30T20:58:01.926Z] Process 12500 handles query #29
[2019-08-30T20:58:03.895Z] Process 12356 sends answer to query #26
[2019-08-30T20:58:03.899Z] Process 12356 handles query #28
[2019-08-30T20:58:03.926Z] Process 12500 sends answer to query #29
[2019-08-30T20:58:05.901Z] Process 12356 sends answer to query #28
[2019-08-30T20:58:05.903Z] Process 12356 handles query #30
[2019-08-30T20:58:07.904Z] Process 12356 sends answer to query #30
and from the client side:
[2019-08-30T20:57:49.293Z] Request 0 launched.
[2019-08-30T20:57:49.376Z] Request 1 launched.
[2019-08-30T20:57:49.380Z] Request 2 launched.
[2019-08-30T20:57:49.381Z] Request 3 launched.
[2019-08-30T20:57:49.384Z] Request 4 launched.
[2019-08-30T20:57:49.387Z] Request 5 launched.
[2019-08-30T20:57:49.390Z] Request 6 launched.
[2019-08-30T20:57:49.393Z] Request 7 launched.
[2019-08-30T20:57:49.396Z] Request 8 launched.
[2019-08-30T20:57:49.404Z] Request 9 launched.
[2019-08-30T20:57:49.407Z] Request 10 launched.
[2019-08-30T20:57:49.410Z] Request 11 launched.
[2019-08-30T20:57:49.413Z] Request 12 launched.
[2019-08-30T20:57:49.421Z] Request 13 launched.
[2019-08-30T20:57:49.425Z] Request 14 launched.
[2019-08-30T20:57:49.427Z] Request 15 launched.
[2019-08-30T20:57:49.435Z] Request 16 launched.
[2019-08-30T20:57:49.438Z] Request 17 launched.
[2019-08-30T20:57:49.441Z] Request 18 launched.
[2019-08-30T20:57:49.444Z] Request 19 launched.
[2019-08-30T20:57:49.452Z] Request 20 launched.
[2019-08-30T20:57:49.455Z] Request 21 launched.
[2019-08-30T20:57:49.458Z] Request 22 launched.
[2019-08-30T20:57:49.461Z] Request 23 launched.
[2019-08-30T20:57:49.469Z] Request 24 launched.
[2019-08-30T20:57:49.472Z] Request 25 launched.
[2019-08-30T20:57:49.475Z] Request 26 launched.
[2019-08-30T20:57:49.479Z] Request 27 launched.
[2019-08-30T20:57:49.487Z] Request 28 launched.
[2019-08-30T20:57:49.489Z] Request 29 launched.
[2019-08-30T20:57:49.493Z] Request 30 launched.
[2019-08-30T20:57:49.496Z] Request 31 launched.
[2019-08-30T20:57:51.637Z] Request 5 - answer: Process 10232 says hello!
[2019-08-30T20:57:51.651Z] Request 0 - answer: Process 10948 says hello!
[2019-08-30T20:57:51.654Z] Request 10 - answer: Process 1732 says hello!
[2019-08-30T20:57:51.704Z] Request 2 - answer: Process 1196 says hello!
[2019-08-30T20:57:51.727Z] Request 7 - answer: Process 10868 says hello!
[2019-08-30T20:57:51.763Z] Request 8 - answer: Process 208 says hello!
[2019-08-30T20:57:51.786Z] Request 6 - answer: Process 12356 says hello!
[2019-08-30T20:57:51.801Z] Request 4 - answer: Process 12500 says hello!
[2019-08-30T20:57:53.677Z] Request 14 - answer: Process 10232 says hello!
[2019-08-30T20:57:53.702Z] Request 31 - answer: Process 1732 says hello!
[2019-08-30T20:57:53.705Z] Request 1 - answer: Process 10948 says hello!
[2019-08-30T20:57:53.777Z] Request 3 - answer: Process 1196 says hello!
[2019-08-30T20:57:53.786Z] Request 12 - answer: Process 10868 says hello!
[2019-08-30T20:57:53.821Z] Request 17 - answer: Process 208 says hello!
[2019-08-30T20:57:53.853Z] Request 9 - answer: Process 12356 says hello!
[2019-08-30T20:57:53.891Z] Request 13 - answer: Process 12500 says hello!
[2019-08-30T20:57:55.715Z] Request 11 - answer: Process 10948 says hello!
[2019-08-30T20:57:55.786Z] Request 15 - answer: Process 1196 says hello!
[2019-08-30T20:57:55.860Z] Request 16 - answer: Process 12356 says hello!
[2019-08-30T20:57:55.898Z] Request 19 - answer: Process 12500 says hello!
[2019-08-30T20:57:57.721Z] Request 18 - answer: Process 10948 says hello!
[2019-08-30T20:57:57.792Z] Request 21 - answer: Process 1196 says hello!
[2019-08-30T20:57:57.866Z] Request 20 - answer: Process 12356 says hello!
[2019-08-30T20:57:57.903Z] Request 23 - answer: Process 12500 says hello!
[2019-08-30T20:57:59.891Z] Request 22 - answer: Process 12356 says hello!
[2019-08-30T20:57:59.922Z] Request 25 - answer: Process 12500 says hello!
[2019-08-30T20:58:01.894Z] Request 24 - answer: Process 12356 says hello!
[2019-08-30T20:58:01.925Z] Request 27 - answer: Process 12500 says hello!
[2019-08-30T20:58:03.897Z] Request 26 - answer: Process 12356 says hello!
[2019-08-30T20:58:03.928Z] Request 29 - answer: Process 12500 says hello!
[2019-08-30T20:58:05.903Z] Request 28 - answer: Process 12356 says hello!
[2019-08-30T20:58:07.906Z] Request 30 - answer: Process 12356 says hello!
As indicated, fewer and fewer queries are handled at once as the time passes. In the end, only Process #12356 still does something. Traces show no exiting from any process.