I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. In a relatively idle situation, I ran the following Perf commands for around 5 seconds. The counters are offcore_response.all_data_rd.l3_miss.local_dram and offcore_response.all_code_rd.l3_miss.local_dram:
sudo perf stat -e offcore_response.all_data_rd.l3_miss.local_dram,offcore_response.all_code_rd.l3_miss.local_dram -p <PID>
The workloads are: 1) playing a video in VLC and 2) running KDevelop indexer on a large code base. The outputs are shown, below:
VLC:
Performance counter stats for process id '14617':
1,621,980 offcore_response.all_data_rd.l3_miss.local_dram
1,611,825 offcore_response.all_code_rd.l3_miss.local_dram
4.993841802 seconds time elapsed
KDevelop:
Performance counter stats for process id '23294':
31,006,390 offcore_response.all_data_rd.l3_miss.local_dram
10,236,222 offcore_response.all_code_rd.l3_miss.local_dram
5.095681532 seconds time elapsed
Based on these statistics, the memory access frequency in KDevelop is more than 12 times as much as VLC.
But the IMC counters statistics (retrieved using PCM) are at odds with the above-mentioned performance counters. In the idle system, the total system bandwidth is around 2.65GB (READ: 2.30GB, WRITE: 0.35GB). The total system bandwidth for each workload (ran separately) is as follows:
VLC:
around `8.40`GB (READ:`4.65`GB, WRITE:`3.75`GB)
KDevelop:
around `3.75`GB (READ:`3.15`GB, WRITE:`0.60`GB)
After reducing the idle system bandwidth, the VLC and KDevelop bandwidths will be around 5.75GB and 1.10GB, respectively. This time, the VLC memory access frequency is more than 5 times as much as KDevelop, which shows an obvious conflict.
How can these two outcomes be described?