I'm trying to solve the following scenario in nodejs in a performant meaner.
- I have a 100Mb worth of jsons which I need to process and the time function to process each entry is about
O(sweet_jesus(n))
. In real time it takes about ~4-5 seconds for each entry. - The only silver lining that I can totally run the processing of each entry individually (about 900 entries in total), they are unrelated.
My first choice was to go for worker_threads
with node-worker-threads-pool
:
import fs from 'fs';
import path from 'path';
import _ from 'lodash';
import moment from 'moment';
import workerPool from 'node-worker-threads-pool';
function generateShortEvaluationsByWorkers(){
const pool = new workerPool.StaticPool({
size: 10,
task: path.resolve('src/simulator/evaluationGenerator.js')
});
let simulationEvaluations = [];
const promises = [];
fs.readdirSync(path.resolve(`results/companies`)).forEach(file => {
const rawData = fs.readFileSync(path.resolve(`results/companies/${file}`));
const company = JSON.parse(rawData);
console.log(new Date(), ": company parsed, sending it for processing:", file);
promises.push(pool.exec(company).then(result=>{
simulationEvaluations.push(result);
}));
});
Promise.all(promises).then(()=>{
fs.writeFileSync(
path.resolve(`results/bundles/simulationEvaluations.json`), JSON.stringify(simulationEvaluations, null, 2)
);
pool.destroy();
})
}
The above code runs beautifully, it shows that the I/O - of reading all the files and feeding it to the pool - takes about 5-6 seconds...
But after that there is absolutely no difference whatsoever compared to running whole thing in a single thread. The logs do show that the processing of the individual files no longer happen in order as before, so I guess there are some threading happening in the background, but the total time does not change one bit. It takes about an hour either way.
Also my hyper-threaded Intel 8750 with 6 cores (12 logical) shows 86% utilization goes to the node process. So my alleged 10 separate thread doesn't even manage to utilize one full core. - EDIT: I was a retard it does make a huge difference I wrote down the times wrong...
After this I crank the thread pool size up to 100 and slice the number of files down to a 100. And that's where freaky stuff starts to happen. First, all my CPU cores go brrrr and my laptop properly melts through the table as one would expect. OS gives zero responsiveness everything is a slideshow. The first 20 or so files gets processed within the same second after which the processing of individual files go to ~3 seconds each (neatly after each other, one message 3-5 seconds after the other). The last 10 or so files gets processed within the same second again.
- Why does 10 threads doesn't make a difference compared to 1 thread?
- Shouldn't I see files to be processed in clusters, where the cluster size is comparable to the number of logical cores, instead of timestamps one after the other?
- Is there a way to "leave" a core to process something else, while calculations still go to Neptune with all the other cores?
EDIT: I wont delete this, maybe somebody will learn from it :) So to answer my own questions:
- It does, I could not measure, could not write, and could not read my CPU meter either at this point... totally my fault
- This one I still don't fully get, but after a few runs I suspect that when you start a whole buttload of threads, you make the whole system hang so much just by the strain of starting them all that by the time its able to spew out the first log, its already done with a bunch of calculation.
- Yeah this is also kinda obvious, do not use so many threads that the thread management itself will make the OS throw a shitfit.
In the end I got the best results with 11 threads btw.