ARM v7 cache behavior and usage

133 Views Asked by At

I'm tweaking u-boot to support Cyclone V SX PCIe + NVMe. I got it working through a series of hacks to version 2020.10 socfpga fork.

I have two issues that need to be resolved:

  1. u-boot doesn't appear to understand that PCI address space and Host address space are not necessarily the same and when the nvme fetches the address assigned to the PCIe port, the PCI driver assumes that the address read maps directly into main memory. It doesn't. It's mapped to another address completely. I have a hack in place to deal with this. I'm going to try to add a feature to the Device Tree utilities within u-boot to do address translation without breaking existing functionality.

  2. There are significant issues with the way the nvme driver handles clearing and invalidating cache to pick up changes to memory after the NVMe has executed a command. The doorbell registers need to be cleared on the outgoing and invalidated on the incoming during polling and that is, almost entirely wrong in various ways.

Here's where I need guidance.

There are two bits of code in u-boot to handle cache clearing / invalidation. One is part of the arm v7 core code and the other is pl310 controller code. The arm v7 code thinks a cache line is 64 bytes, the pl310 code thinks a cache line is 32 bytes. Wtf?

Both codes are being used by the nvme code. I think the clear goes through pl310 and the invalidate goes through arm v7, but I may have that backwards at this point, I'm still learning and could have easily mixed them up.

The issue with the nvme driver is simple: It asks to clear an address range which is not cache aligned and both of the codes choke and do the wrong thing and ultimately do not clear or invalidate the cache where it needs to be cleared and invalidated. In the case of the pl310, it rounds up the start address and doesn't do anything and in the case of the arm v7 code, if the start and stop, after aligning them to 64 byte boundaries, are the same, the code does nothing.

I've hacked this by forcing the alignment to 64 bytes in the nvme code and ensuring that it over clears the cache by adding one to the stop if it is the same as start in the case of the arm v7 call.

If you're still following me. Thanks, I really appreciate it.

What is the right way to fix this. None of this code is mine. It seems to me that there are bugs in the cache code. The pl310 should not add one to the start address, perhaps it should add it to the stop address, depending on how the for loop is written. The for loop for invalidating in the arm v7 code should run at least once to ensure it clears the start location. So the invalidate loop should be <=, not <.

I could post this code if anyone is interested, just let me know.

But the big question is: why are there two ways to clear / invalidate cache and why are both being used? It seems to me that since the Cyclone V uses a pl310 cache controller, it's code should be used for both clear and invalidate. This is configured via #define for determining which code is called and something tells me that this should be fixed as well.

Opinions? Options? Ideas?

I'm open, please let me know your thoughts.

1

There are 1 best solutions below

2
On

Since you're on something based on v2020.10 you're missing all of the cache related changes here: https://source.denx.de/u-boot/u-boot/-/commit/d0c04926cd054cf7360ec15913ac17a465f32603

As for the rest of what you raise, the U-Boot mailing list if the right forum and please make sure to CC the relevant people that have worked on the code before. The scripts/get_maintainer.pl tool will help with that.