I’m new to systems programming and am toying with io_uring. I have the start of a design for a networked program and have identified some CPU-bound work that I think should be offloaded to a thread pool. I’m unsure, though, of the ways to synchronize that work with the ring thread and the tradeoffs associated.
The first solution that came to mind was a structure (e.g. queue) locked with pthread_mutex and pthread_cond. This seems inappropriate because I doubt that either io_uring_enter or pthread_cond_wait return often enough and under the right conditions to live in the same loop. Even if they did, it seems clumsy to introduce another syscall into this hot loop.
My current design involves a pair of file descriptors to be shared between the ring and the thread pool: one for each direction. Since there’s only one process, pointers are suitable as the messaging over the descriptors. If reads and writes are atomic, then the descriptors provide the synchronization: only e.g. read/write calls are needed on the pool side, regular io_uring operations on the other and no need to lock either these operations or the memory underlying the pointers.
I’m also aware of IORING_OP_MSG_RING but since the CPU-bound work can be cleanly separated, I’d rather schedule any subsequent IO on the (single) ring right after.
- Are there any obvious problems with this approach?
- Am I right to think that an anonymous pipe is the most appropriate type of descriptor to use here?
- Is the
PIPE_BUFlimit the only condition for atomicity? - Are there other approaches I’m not considering? Is this among the most performant?
I found two excerpts from Jens Axboe that name
IORING_OP_MSG_RINGas a solution to this problem specifically.From the slide deck (slide 31) of a 2022 Kernel Recipes talk:
And from the (currently, only) wiki article io_uring and networking in 2023 on the
liburingGitHub page:Here’s a demonstration of this pattern, as I understand it:
A few interesting points:
io_uringbecause the values written need to live until completion. Using the above example as a model, in order to send pointers to the array on the stack, a separate array of pointers would need to be allocated. A queue to manage writes to the queue, essentially.IORING_SETUP_SINGLE_ISSUERandIORING_SETUP_ATTACH_WQas obvious optimizations.IORING_SETUP_COOP_TASKRUNlays out a compelling case but I’m still unsure whetherIORING_SETUP_DEFER_TASKRUNis beneficial here.As I’ve indicated above, I’m a bit out of my depth here and so am writing this answer only prospectively and to make available my progress so far. If anyone coming after knows better, I’d be happy to accept another answer or amend this one.