libuv worker threads or work queue health check?

1.4k Views Asked by At

In libuv, you can end up tying up the worker threads with too much work or buggy code. Is there a simple function that can check the health of the worker threads or thread queue? It doesn't have to be 100% deterministic, after all it would be impossible to determine whether the worker thread is hanging on slow code or an infinite loop.

So any of the following heuristics would be good:

  • Number of queued items not yet worked on. If this is too large, it could mean the worker threads are busy or hung.

  • Does libuv have any thread killing mechanism where if the worker thread doesn't check back in n seconds, it gets terminated?

2

There are 2 best solutions below

0
On

If this is for nodejs, would a simple monitor thread do? I don't know of a way to get information about the event queue internals, but you can inject a tracer into the event queue to monitor that threads are being run in a timely manner. (This measures load not by the number of threads not yet run, but by whether the threads are getting run on time. Same thing, kind of.)

A monitor thread could re-queue itself and check that it gets called at least every 10 milliseconds (or whatever max cumulative blocking ms is allowed). Since nodej runs threads round-robin, if the monitor thread was run on time, it tells us that all other threads got a chance to run within that same 10 ms window. Something like (in node):

// like Date.now(), but with higher precision
// the extra precision is needed to be able to track small delays
function dateNow() {
    var t = process.hrtime();
    return (t[0] + t[1] * 1e-9) * 1000;
}

var _lastTimestamp = dateNow();   // when healthMonitor ran last, in ms
var _maxAllowedDelay = 10.0;      // max ms delay we allow for our task to run
function healthMonitor() {
    var now = dateNow();
    var delay = now - _lastTimestamp;
    if (delaly > _maxAllowedDelay) {
        console.log("healthMonitor was late:", delay, " > ", _maxAllowedDelay);
    }
    _lastTimestamp = now;
    setTimeout(healthMonitor, 1);
}

// launch the health monitor and run it forever
// note: the node process will never exit, it will have to be killed
healthMonitor();

Throttling the alert messages and supporting a clean shutdown is an exercise left to the reader.

0
On

That function does not exist in libuv itself, and I am not aware of any OSS that provides something like that.

In terms of a killing mechanism, there is none baked into libuv, but http://nikhilm.github.io/uvbook/threads.html#core-thread-operations suggests:

A well designed program would have a way to terminate long running workers that have already started executing. Such a worker could periodically check for a variable that only the main process sets to signal termination.