I have a dual core 1GHz ARM system with linux where the business application should do low-latency serial IO> <10ms response time, receive ~10byte request, send ~20byte response. After a few iterations of design refinement the results are fairly good, but there are still occasional missed deadlines.
Doing system-wide tracing with Perfetto.dev revealed the following:
- missed deadlines occur when both cores are saturated
- the application thread has elevated priority and once it becomes runnable it enters running state in <100us and performs its job within 2ms (well within spec)
- the application thread is woken up by kworker/u4:1-events_unbound (handling serial hw) and this has the same prio as other user threads
- when a deadline is missed it is the kworker that's starved (stays in runnable state, waiting for available CPU resource)
Since the CPU becomes saturated from time to time, I see no other option than to prioritize threads that have a deadline over the rest. But changing the the priority of kworker threads (renice, chrt) smells to say at least:
- they are owned by the kernel, it doesn't feel right to change their prio
- they are created and destroyed runtime, scanning them and setting prio at startup once is not viable
What is the proper way to solve this?
Putting more cores at work is not an option. Even when the CPU is saturated the traces show that the deadline could be easily met, if the right thing would execute and not some background activity.
--
This trace depicts a single occurrence of the issue:
- the flag indicates the timepoint where traffic arrives: 6 bytes
- at this point kworker becomes runnable, but stays in this state this for 6.7ms
- during this time multiple processes are scheduled on both cores
- when kworker eventually runs, it wakes my thread and the application code runs
