The CPU I am using is AMD 9654*2, with memory running at 4800MHz and a total of 24 channels. Therefore, theoretically, the memory bandwidth limit of the machine is 921.6GB/s (4800 * 8 * 24 / 1000).
However, when I tested the latency and bandwidth relationship using MLC, I found that when the memory bandwidth exceeds 388GB/s (42.1% utilization), the memory latency increases significantly, impacting software performance. When the memory bandwidth exceeds 700GB/s, the memory latency exceeds 1000ns, making the software completely unusable.
Why does memory latency start to increase significantly when the memory bandwidth utilization is below 50%? Which aspect of hardware capability is this constraint coming from? Is it the CPU, MMU, DIMM, or DRAM?
./mlc --loaded_latency
Delay (ns) MB/sec
==========================
00000 1135.83 736300.0
00002 1147.45 736493.2
00008 1150.77 736122.9
00015 1149.88 734617.0
00050 1181.60 734202.9
00100 1123.81 734903.9
00200 1112.25 734429.5 <=
00300 173.65 631984.1
00400 155.37 480770.5
00500 144.68 388157.8 <=
00700 140.57 280308.7
01000 140.71 197585.1
01300 137.59 153031.9
01700 136.29 117527.8
02500 135.91 80356.0
03500 134.87 57644.2
05000 134.92 40557.2
09000 134.71 22780.5
20000 133.78 10525.7