I try to understand the behavior of the JMH sample JMHSample_32_BulkWarmup .
The result for JMHSample_12_Forking is something like this :
Benchmark Mode Cnt Score Error Units
JmhForking.measure_1_c1 avgt 5 2.400 ± 0.110 ns/op
JmhForking.measure_2_c2 avgt 5 18.405 ± 1.174 ns/op
JmhForking.measure_3_c1_again avgt 5 19.427 ± 0.526 ns/op
JmhForking.measure_4_forked_c1 avgt 5 2.133 ± 0.105 ns/op
JmhForking.measure_5_forked_c2 avgt 5 2.116 ± 0.067 ns/op
As explained here, bimorphic inlining occurs when a call is done with the second object type. So the perfs are broken for the third call. All this because the calls are done into the same JVM.
As per my understanding (obviously wrong), in JMHSample_32_BulkWarmup everything happens as like as unforked benchmarks of JMHSample_12_Forking because bulk warmups and iterations are done in the same JVM. I would expect:
- warmup benchmark 1 -> monomorphic inlining
- warmup benchmark 2 -> bimorphic inlining, broken perfs.
- iterations -> I would await broken perfs
So, why do we have to break inlining in JMHSample_32_BulkWarmup to have the same behavior than JMHSample_12_Forking ?
If I remove the DONT_INLINE compiler control in JMHSample_32_BulkWarmup:
# Run progress: 0.00% complete, ETA 00:05:00
# Fork: 1 of 1
# Warmup Iteration 1: 2.098 ns/op
# Warmup Iteration 2: 2.126 ns/op
# Warmup Iteration 3: 2.081 ns/op
# Warmup Iteration 4: 2.084 ns/op
# Warmup Iteration 5: 2.082 ns/op
# Warmup Iteration 1: 17.407 ns/op
# Warmup Iteration 2: 17.444 ns/op
# Warmup Iteration 3: 17.452 ns/op
# Warmup Iteration 4: 17.393 ns/op
# Warmup Iteration 5: 17.422 ns/op
Iteration 1: 2.060 ns/op
Iteration 2: 2.054 ns/op
Iteration 3: 2.065 ns/op
Iteration 4: 2.058 ns/op
Iteration 5: 2.055 ns/op
So, the things do not occurs as like as in JMHSample_12_Forking.
Tested with a JDK 17.