Value is wrong first time pointer is dereferenced but correct after that

292 Views Asked by At

I have a ZYNQ Ultrascale+ MPSoC Genesys ZU dev board that I'm running my application on. I have an accelerator in the PL that is connected to the PS through a simple AXI DMA. The DMA reads the DDR memory through a normal, non-coherent, FPD slave port on the PS. The application is running on one of the A53 cores in the PS.

I've verified with an ILA that the data being written to the AXI slave port is correct. However, some of the data I'm reading back in software was incorrect. At least part of the issue before was the cache in the A53. As a temporary solution I've disabled the D-cache at the start of the program so there should be no issues there anymore. Now though, the first time I try to print/read from the array of data I receive, I get an incorrect value. Subsequent reads return the correct value. What gives? How is this happening?

Using the Vitis debugger/memory viewer, I've verified that the correct data is present at the memory location I allocated and told the DMA to write to.

Below is a watered down version of the program, removing much of the program that has no issues.

#define CACHE_LINE_SIZE 64

int main(void)
{
    Xil_DCacheDisable();

    //A bunch of DMA initialization
    ...

    //Send data to accelerator through DMA, no issues here
    ...
    
    float* outputCorrelation;
    const size_t outputCorrelationSizeBytes = sizeof(*outputCorrelation) * 80;
    outputCorrelation = aligned_alloc(CACHE_LINE_SIZE, outputCorrelationSizeBytes);
    if(outputCorrelation == NULL) {
        printf("Aligned Malloc failed\n");
        return XST_FAILURE;
    }

    //Initiate data receive transfer first
    int result = XAxiDma_SimpleTransfer(&axiDma,(UINTPTR) outputCorrelation, outputCorrelationSizeBytes, XAXIDMA_DEVICE_TO_DMA);
    if(result != XST_SUCCESS) {
        return result;
    }

    //Send data - assembledData allocation isn't shown as no problems here
    result = XAxiDma_SimpleTransfer(&axiDma,(UINTPTR) assembledData, sizeof(*assembledData) * inLen, XAXIDMA_DMA_TO_DEVICE);
    if(result != XST_SUCCESS) {
        return result;
    }

    //Wait for completion interrupts from DMA
    ...

    for(size_t x = 0; x < 80; x++) {
        printf("[%zu]\t%f\n", x, outputCorrelation[x]);
    }
}

The expected output is the value 4 for every element of the array.

Output:

[0] -nan
[1] 4.000000
[2] 4.000000
[3] 4.000000
[4] 4.000000
...
[79] 4.000000

If I add a print of the any value of the array prior to for loop, the first value becomes correct and all values in the for loop are perfect. What's going on here and how can I solve it?

Edit: I had a thought that the compiler might be optimizing away the read or something since none of the functions directly write to the allocated array so I tried marking the output buffer as volatile. This did not change the behavior.


I did some more testing with my PL accelerator and tried connecting it to the LPD ports of the PS so I could try using the RPU instead of the APU. Using the exact same code in the RPU instead of the APU yielded my expected result. I have a suspicion there's still some issues with cache coherency even though I disabled the dcache when running on the APU.

Something I also didn't mention earlier is that when I single-step through my code, the issue does not exist. When still using the debugger but running through the critical sections, the issue does exist.

0

There are 0 best solutions below