Linux DMA 32-bit dma_alloc_coherent wrong behavior when intel_iommu=on

1.1k Views Asked by At

I'm writing a linux device driver to DMA data from FPGA into CPU RAM via PCI express. Running 64 bit Centos 8.1, kernel 4.18.0-147.3.1 on Intel Platform.

The implementation follows the DMA-API-HOWTO. The DMA is 32-bit and the driver uses consistent mapping for a DMA descriptors ring. Accordingly I set the DMA mask to inform the kernel about the devices DMA addressing capabilities.

pciRet = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (pciRet < 0) {
    printk(KERN_ERR "dma_set_mask_and_coherent returned: %d\n", pciRet);
    return -EIO;
}

The FPGA and the driver are designed to execute the transactions of a ring of 240 buffer descriptors until it is stopped by user space program. The buffer descriptor ring and the actual buffers are allocated by means of dma_alloc_coherent, mapped when transaction initialization is triggered by user space program, unmapped at the end.

#define DMA_BD_CNT 240
#define DMA_BD_WORD_SIZE 16
#define BUFFER_SIZE 65536

bdsize = sizeof(u32) * DMA_BD_WORD_SIZE * DMA_BD_CNT 

// Linked list buffer descriptor list
BdAddr = dma_alloc_coherent(&pdev->dev, bdsize, &BdDmaAddr, GFP_KERNEL);
if (!BdAddr) {
    printk(KERN_ERR "failed to allocate coherent buffer\n", DRIVER_NAME);
    err = -EIO;
    goto err1;
}

// actual buffers of size 64kB 
for (i = 0; i < DMA_BD_CNT; i++) {
    RxData[i] = dma_alloc_coherent(&pdev->dev, BUFFER_SIZE, (dma_addr_t *)&RxDmaHandle[i], GFP_KERNEL);
    if (!RxData[i]) {
      printk(KERN_ERR "rx page allocation failure\n");
      err = -ENOMEM;
      goto err2;
    }
}

I'm facing two situations:

  1. Our application works fine when SWIOTLB is enabled. This prevent the usage of hardware VT-d IOMMU and it is the default on Intel machines.

    intel_iommu=off iommu=soft
    
  2. However, it does not work when the Intel VT-d IOMMU is enabled

    intel_iommu=on
    

    First time I run the transaction everything goes fine, but from the following dma start the buffers initialization fails because the dma_alloc_coherent constantly gets PTE errors (from DMAR:) with the following message for every of the 16 pages it is trying to allocate for each 64k buffer:

    [  164.470945] DMAR: ERROR: DMA PTE for vPFN 0xfef90 already set (to 1050b3f003 not 1057e1f003)
    [  164.470950] WARNING: CPU: 1 PID: 14017 at drivers/iommu/intel-iommu.c:2321 __domain_mapping.cold.88+0x46/0x4d
    [  164.470950] Modules linked in: dma_ch_dev(OE) 8021q garp mrp stp llc dell_rbu dcdbas dell_smbios dell_wmi_descriptor wmi_bmof intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf wdat_wdt pcspkr ipmi_ssif sg i2c_i801 lpc_ich ftdi_sio mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod mgag200 mlx5_core i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci mlxfw drm tg3 libata megaraid_sas dm_mirror dm_region_hash dm_log dm_mod sctp libcrc32c crc32c_intel
    [  164.470963] CPU: 1 PID: 14017 Comm: dma-app Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-147.3.1.el8_1.x86_64 #1
    [  164.470964] Hardware name:  /01YM03, BIOS 2.2.11 06/13/2019
    [  164.470965] RIP: 0010:__domain_mapping.cold.88+0x46/0x4d
    [  164.470965] Code: 4c 24 08 e8 88 b3 bb ff 8b 05 54 48 de 00 4c 8b 4c 24 08 4c 8b 54 24 10 4c 8b 44 24 18 85 c0 74 09 83 e8 01 89 05 38 48 de 00 <0f> 0b e9 e6 ce ff ff 89 da 4c 89 c1 48 c7 c6 70 c8 4a 87 48 c7 c7
    [  164.470966] RSP: 0018:ffffb6f620bcf538 EFLAGS: 00010246
    [  164.470966] RAX: 0000000000000000 RBX: 0000001057e1f003 RCX: 0000000000000006
    [  164.470967] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff985ddf216a00
    [  164.470967] RBP: ffff985dddd1dcf8 R08: 0000000000000001 R09: ffff985dddd1dc80
    [  164.470968] R10: 0000000001057e1f R11: 00000000000ee800 R12: 0000000000000001
    [  164.470968] R13: 0000000000000000 R14: 00000000000fef9f R15: ffff985dd993ee00
    [  164.470969] FS:  00007efd1ca2a740(0000) GS:ffff985ddf200000(0000) knlGS:0000000000000000
    [  164.470969] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  164.470969] CR2: 00007efd1ac035f0 CR3: 0000000844df4003 CR4: 00000000007606e0
    [  164.470970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  164.470970] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  164.470970] PKRU: 55555554
    [  164.470971] Call Trace:
    [  164.470972]  domain_mapping+0x1b/0xe0
    [  164.470973]  __intel_map_page+0xf1/0x140
    [  164.470975]  intel_alloc_coherent+0x96/0x120
    [  164.470976]  descriptor_init+0x4b/0xe0 [dma_ch_dev]
    [  164.470978]  dma_register+0x147/0x210 [dma_ch_dev]
    [  164.470983]  ? 0xffffffffc0836000
    [  164.470985]  dma_start.cold.25+0x77/0x28d [dma_ch_dev]
    [  164.470985]  ? 0xffffffffc0836000
    [  164.470995]  ? get_page_from_freelist+0xd87/0x1210
    [  164.470997]  ? get_page_from_freelist+0xd87/0x1210
    [  164.470999]  ? mem_cgroup_commit_charge+0x7a/0x560
    [  164.471000]  ? mem_cgroup_try_charge+0x8b/0x1a0
    [  164.471001]  ? mem_cgroup_throttle_swaprate+0x17/0x10e
    [  164.471003]  ? do_anonymous_page+0x1d2/0x370
    [  164.471004]  ? __handle_mm_fault+0x66e/0x6b0
    [  164.471006]  ? lookup_fast+0xc8/0x2f0
    [  164.471009]  ? update_load_avg+0x87/0x590
    [  164.471011]  ? account_entity_enqueue+0xc5/0xf0
    [  164.471011]  ? enqueue_entity+0xf6/0x630
    [  164.471013]  ? legitimize_path.isra.44+0x2d/0x60
    [  164.471015]  ? enqueue_task_fair+0x7d/0x3e0
    [  164.471016]  ? select_idle_sibling+0x22/0x3d0
    [  164.471017]  ? check_preempt_curr+0x7a/0x90
    [  164.471017]  ? ttwu_do_wakeup+0x19/0x130
    [  164.471019]  ? try_to_wake_up+0x54/0x4b0
    [  164.471019]  ? filename_lookup.part.64+0xe0/0x170
    [  164.471021]  ? tty_insert_flip_string_fixed_flag+0x85/0xe0
    [  164.471023]  ? pty_write+0x78/0x90
    [  164.471025]  ? __wake_up_common_lock+0x89/0xc0
    [  164.471026]  do_vfs_ioctl+0xa4/0x630
    [  164.471028]  ? syscall_trace_enter+0x1d3/0x2c0
    [  164.471029]  ksys_ioctl+0x60/0x90
    [  164.471030]  __x64_sys_ioctl+0x16/0x20
    [  164.471031]  do_syscall_64+0x5b/0x1b0
    [  164.471032]  entry_SYSCALL_64_after_hwframe+0x65/0xca
    [  164.471033] RIP: 0033:0x7efd1ac9ceab
    [  164.471034] Code: 0f 1e fa 48 8b 05 dd 9f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad 9f 2c 00 f7 d8 64 89 01 48
    [  164.471034] RSP: 002b:00007fff47df73a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [  164.471035] RAX: ffffffffffffffda RBX: 00000000025b3ca0 RCX: 00007efd1ac9ceab
    [  164.471035] RDX: 0000000000000000 RSI: 0000000000005302 RDI: 0000000000000003
    [  164.471035] RBP: 0000000000000000 R08: 000000000000000a R09: 0000000000000001
    [  164.471036] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [  164.471036] R13: 0000000000000498 R14: 00000000025b3a60 R15: 0000000000000002
    [  164.471037] ---[ end trace d8ffc4ec65fb8ee9 ]---
    

In the second run, it looks like intel_alloc_coherent tries to reuse some of already allocated buffers.

The dma_alloc_coherent for the buffer descriptor ring does not give any warning. The buffers are correctly free at the end of every run by means of dma_free_coherent.

Are there specific strategies to handle 32-bit DMA with Intel IOMMU enabled?

Thank you for any help you can provide.

0

There are 0 best solutions below