OpenOnlaod throws segmentation fault

72 Views Asked by At

I installed OpenOnload to accelerate networking. I installed OpenOnload from source following the official documentation.

$ onload

Kernel module: 8.1.1.17
find /lib/modules/`uname -r` -type f -name '*.ko' -printf '%f\n' | grep -E 'sfc|onload'
sfc_driverlink.ko
sfc.ko
sfc_resource.ko
sfc_char.ko
onload.ko

As you can see at the top, it is installed system-wide and onload commands works as expected. However when i run my application with onload <app-exe> it gives the following error

Segmentation fault (core dumped)

What else should I check to see the issue? I do also have SF NICs are attached to my server. I want to achieve kernel bypass using those NICs and OpenOnload.

$ lsmod | grep onload
onload                835584  4
sfc_char              143360  1 onload
sfc_resource          249856  2 onload,sfc_char

$ lsmod | grep sfc
sfc_char              143360  1 onload
sfc_resource          249856  2 onload,sfc_char
sfc                   864256  0
vdpa                   32768  1 sfc
sfc_driverlink         16384  2 sfc,sfc_resource
mtd                    90112  8 sfc
1

There are 1 best solutions below

1
bunnywarren On

It's not specified when the segmentation fault occurs or whether the application works correctly without the onload prefix but I will assume it does.

I would suggest the first step is running a known application with Onload to confirm the installation is working as expected. An easy way to do this would be to accelerate a ncat instance with the following command (provide a suitable port number):

onload ncat -l <portnum>

If this fails to start it would indicate an issue with the Onload installation, in which can I would suggest uninstalling and reinstalling Onload.

If it does work it points to something within the application causing issues for Onload. In this scenario I would recommend running with the "safe" profile to remove any concurrency problems that could be present. To do this add "--profile=safe" between "onload" and "app-exe":

onload --profile=safe <app-exe>

It sets Onload options that add additional checks for calls that offer the chance of corrupting data if used concurrently, e.g. modifying an file descriptor. If this works I recommend checking calls that modify socket that could occur concurrently and seeing if they could be removed. Alternatively setting EF_FDS_MT_SAFE=0 would prevent these conflicting. The Onload User Guide offers additional information on this option: https://docs.xilinx.com/r/en-US/ug1586-onload-user/EF_FDS_MT_SAFE