On recent x86, RDTSC returns some pseudo-counter that measures time instead of clock cycles.
Given this, how do I measure actual clock cycles for the current thread/program?
Platform-wise, I prefer Windows, but a Linux answer works too.
On recent x86, RDTSC returns some pseudo-counter that measures time instead of clock cycles.
Given this, how do I measure actual clock cycles for the current thread/program?
Platform-wise, I prefer Windows, but a Linux answer works too.
Copyright © 2021 Jogjafile Inc.
This is not simple. Such a thing is described in the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3B:
Here is the behaviour:
Here is the advise for your use-case:
The bad news is that AFAIK performance counters are often not portable between AMD and Intel processors. Thus, you certainly need to check which performance counters to use in the AMD documentation. There are also complications: you cannot easily measure the number of of cycle taken by any arbitrary code. For example, the processor can be halted or enter in sleep mode for a short period of time (see C-state) or the OS can executing some protected code that cannot be profiled without high privileges (for sake of security). This method is fine as long as you need to measure the number of cycle of a numerically-intensive code taking relatively-long time (at least several dozens of cycles). On top of all of that, the documentation and usage of MSR is pretty complex and it has some restrictions.
Performance counters like
CPU_CLK_UNHALTED.THREAD
andCPU_CLK_UNHALTED.REF_TSC
seems a good start for what you want to measure. Using library to read such performance counter is generally a very good idea (unless you like having a headache for at least few days). PAPI might be enough to do the job for this.Here is some interesting related posts: