I'm playing around with async-profiler and perf and decided to measure kernel activity for DirectByteBuffer
disk IO. Here is the code (written in Scala, but its Java version should be obvious):
val path = Paths.get("/tmp/test")
val buffer = ByteBuffer.allocateDirect(4096 * 4096)
def main(args: Array[String]): Unit = {
var fullReadsCount = 0
while (true) {
var bytesRead = 0
var ch: SeekableByteChannel = null
try {
ch = Files.newByteChannel(path)
while (bytesRead >= 0) {
bytesRead = ch.read(buffer)
buffer.clear()
}
} finally {
if (ch != null)
ch.close()
}
fullReadsCount += 1
if(fullReadsCount % 100 == 0) println(fullReadsCount)
}
}
I ran this code multiple times and did both perf
and async-profiler
and noticed the following result:
async-profiler
$~/profiler.sh -i 28169 -d 30 <pid> //.... stack traces ommited ns percent samples top 264788732 61.02% 9317 copy_user_enhanced_fast_string_[k] 41510919 9.57% 1467 generic_file_read_iter_[k] 9333863 2.15% 331 find_get_entry_[k] 4181131 0.96% 148 __radix_tree_lookup_[k] 4057194 0.94% 143 copy_page_to_iter_[k] 1860485 0.43% 63 __d_lookup_rcu_[k] 1610407 0.37% 50 _raw_spin_unlock_irqrestore_[k]
perf
sudo perf record -F 31499 -g -p <pid> -- sleep 30
Averagely among all runs I did I noticed that copy_user_enhanced_fast_string
percentage is different in perf
and async-profiler
61.02%
vs 77.65%
QUESTION: Why is the percentage different for copy_user_enhanced_fast_string
sampled by perf
and async-profiler
? I tried to provide the equal condition (Frequency and sampling period and I did not run both of the profiler simulatenously. 31499 Hz ≈ 28169 nanos).
Or my interpretation of the result is wrong?
The selected profiling interval (28 μs) is too short.
Check
dmesg
- there are probably kernel warnings likeasync-profiler
differs fromperf
in the mechanism of processing PMU events.perf
just collects samples in the ring buffer while async-profiler sends a signal to the process in order to invoke application-specific callback. Typically there is no visible difference to a user, but when profiling frequency is too high, signals may introduce additional noise and affect process scheduling.I suggest to increase profiling interval to at least 100 μs (10000 Hz). This should make measurements more reliable.