Best practice for realtime periodic task (< 1ms) with linux and multi core system

1.3k Views Asked by At

I am using a quad-core embedded computer with Linux as OS for controlling robot system.

Basically the project is multi-threaded, single-process program written in C++.
Here are some of the backgrounds and requirements;

  • Some task (signal processing and communication with hardware) requires some what strict "real-time" operation.
    • Cycle period is 500us or 1000us (configurable)
    • The counter party of the operation is so-called 'hard real-time HW' (one with designated DSP)
    • Missing 1~2 cycle occasionally (may be due to jitter) will cause hardly noticeable degradation of system operation.
    • Missing 3~9 cycle occasionally will cause quite noticeable degradation of system performance but not fatal.
    • Missing >10 cycle at least once will cause whole system stop and considered fatal major malfunction.

The solution in my mind is combination of below.

  • Put all the function required for 'real-time operation' in a single thread and make it 'real-time thread'.
  • Put other functions in one or two threads and make it 'non-real-time threads'.
  • Set sched_priority of the real-time thread around 95.
  • Spare a CPU core for real-time operation by manipulating the CPU-affinity of all the default linux services to use 3 cores except 'real-time core' and set thread CPU affinity of the 'real-time thread' to use spared real-time core (so that critical tasks get minimum interference).
  • Inter-thread communication will be done with std::atomic variables with memory order release and acquire but minimized.

Would it be a good practice for the application? Is there any other practice for more stable operation?

1

There are 1 best solutions below

4
SK-logic On

If you need your 1ms periodic process to have a very low jitter, you simply cannot rely on the system timers. Xenomai or not, jitter will be significant.

The only reasonable method is to do the following (it's a typical approach in both robotics and low latency finanical applications):

  • Isolate one CPU core (isolcpus=...)
  • Build kernel with NOHZ_FULL support, set up nohz_full for the isolated CPU core
  • Configure rcu_nocbs for this isolated core.
  • Use sched_setaffinity(...) to bind your process to your chosen isolated CPU
  • Use sched_setscheduler(...) to set SCHED_FIFO policy for this process (or any other real-time scheduling policy, it will tell the kernel to actually respec the NOHZ_FULL setting for this core).
  • You must only have ONE process running on that core
  • Do not use any system calls in that real-time process, system calls result in context switches and this is what you must avoid at all costs.
  • Use busy wait - spin all the time you need to wait for an even or a time period. Do not use system calls for sleeping. Your CPU load must always stay at 100%
  • Use lock-free communication between your real-time process and non-realtime supporting processes.
  • Be mindful of the DDR latency
  • Make sure that everything you do in your real-time process loop terminates before this 1ms timeout.

This way you will have a sub-microsecond jitter, as long as the other cores or devices in your system do not clog the bus too much.

Keep in mind that your I/O in this process must be entirely in user space. You should try to avoid using system calls to talk to the peripherals. It's possible in some cases if you can directly write to the control registers of the devices and use DMA to get the data out of them (i.e., just move the driver functionality from kernel to user space). More complicated if your peripherals rely on interrupts. An ideal implementation is if you're using an FPGA SoC (such as Xilinx Zynq UltraScale+, or Intel Cyclone V, etc.) - you can implement your own I/O peripherals that communicate via registers and DMA only and do not need interrupts.