Achieve maximum bandwidth on arm64

355 Views Asked by At

I am trying to achieve near maximum memory bandwidth on my system where theoretical maximum bandwidth is 25.5GB/s running with one DDR channel and 4 cores.

I tried running following strees-ng benchmark:

./stress-ng --taskset 0xf --memrate 1 --memrate-wr-mbs 50000 --memrate-rd-mbs 30000 -t 60

But I see maximum bandwidth is around 11000MB/s that is less than 50% of total maximum bandwidth.

Also, I see this blog about achieving maximum bandwidth:

https://codearcana.com/posts/2013/05/18/achieving-maximum-memory-bandwidth.html:

    void write_memory_rep_stosq(void* buffer, size_t size) {
       // size in bytes, assumed to be a multiple of 8
       asm("cld\n"          // usually unnecessary, compilers keep DF=0
       "rep stosq"
        : : "D" (buffer), "c" (size / 8), "a" (0) );
        // dangerously buggy: missing "memory" clobber
        // and telling the compiler RDI and RCX are pure inputs, not "+D" / "+c"
    }

And when I run, I get results that are really close to the peak bandwidth, thanks to modern x86 features like ERMSB handling this with optimized microcode.

          $ ./memory_profiler
          write_memory_rep_stosq: 20.60 GiB/s

But this is for x86_64, is there any such equivalent instruction for ARM64 ?

0

There are 0 best solutions below