I'm playing around with async-profiler and perf and decided to measure kernel activity for DirectByteBuffer disk IO. Here is the code (written in Scala, but its Java version should be obvious):
val path = Paths.get("/tmp/test")
val buffer = ByteBuffer.allocateDirect(4096 * 4096)
def main(args: Array[String]): Unit = {
var fullReadsCount = 0
while (true) {
var bytesRead = 0
var ch: SeekableByteChannel = null
try {
ch = Files.newByteChannel(path)
while (bytesRead >= 0) {
bytesRead = ch.read(buffer)
buffer.clear()
}
} finally {
if (ch != null)
ch.close()
}
fullReadsCount += 1
if(fullReadsCount % 100 == 0) println(fullReadsCount)
}
}
I ran this code multiple times and did both perf and async-profiler and noticed the following result:
async-profiler
$~/profiler.sh -i 28169 -d 30 <pid> //.... stack traces ommited ns percent samples top 264788732 61.02% 9317 copy_user_enhanced_fast_string_[k] 41510919 9.57% 1467 generic_file_read_iter_[k] 9333863 2.15% 331 find_get_entry_[k] 4181131 0.96% 148 __radix_tree_lookup_[k] 4057194 0.94% 143 copy_page_to_iter_[k] 1860485 0.43% 63 __d_lookup_rcu_[k] 1610407 0.37% 50 _raw_spin_unlock_irqrestore_[k]perf
sudo perf record -F 31499 -g -p <pid> -- sleep 30
Averagely among all runs I did I noticed that copy_user_enhanced_fast_string percentage is different in perf and async-profiler 61.02% vs 77.65%
QUESTION: Why is the percentage different for copy_user_enhanced_fast_string sampled by perf and async-profiler? I tried to provide the equal condition (Frequency and sampling period and I did not run both of the profiler simulatenously. 31499 Hz ≈ 28169 nanos).
Or my interpretation of the result is wrong?

The selected profiling interval (28 μs) is too short.
Check
dmesg- there are probably kernel warnings likeasync-profilerdiffers fromperfin the mechanism of processing PMU events.perfjust collects samples in the ring buffer while async-profiler sends a signal to the process in order to invoke application-specific callback. Typically there is no visible difference to a user, but when profiling frequency is too high, signals may introduce additional noise and affect process scheduling.I suggest to increase profiling interval to at least 100 μs (10000 Hz). This should make measurements more reliable.