I was trying to write a fairly generic testing framework for profiling a group of functions, but ran into an issue I couldn't place my finger on detailed below.
The concept is pretty simple. I created a Test
abstract class:
public abstract class Test {
private final String name;
public Test(String name) {
this.name = name;
}
public abstract void test();
}
Then, I have a master test class, with some configuration information and the loops for running the tests.
public class MyTestClass {
public static double staticMethod1(Quat4f q) {
double angle;
float dirW = q.w;
if (q.w * q.w + q.y * q.y > 0.0f) {
if (q.w > 0.f && q.y < 0.0f)
dirW *= -1.0f;
angle = 2.0f * Math.acos(dirW);
} else {
angle = 0.0f;
}
return angle / 6.283f * 100.f;
}
public static double staticMethod2(Quat4f q) {
AxisAngle4f axisAngle = new AxisAngle4f();
axisAngle.set(q);
return axisAngle.angle / 6.283f * 100.f;
}
public static final void main(String[] args) {
final Quat4f quaternion = new Quat4f(0, 0, 0, 1);
Test[] tests = new Test[] {
new Test("staticMethod1") {
@Override
public void test() {
staticMethod1(quaternion);
}
},
new Test("staticMethod2") {
@Override
public void test() {
staticMethod2(quaternion);
}
}
};
long startTime = 0;
int repeat = 10; //How many times to repeat each iteration.
int[] tiers = new int[] { 1000, 10000, 100000, 1000000 };
long[][][] times = new long[tests.length][tiers.length][iterations];
for (int testIndex = 0; testIndex < tests.length; testIndex++) {
for (int tierIndex = 0; tierIndex < tiers.length; tierIndex++) {
for (int r = 0; r < repeat; r++) {
startTime = System.nanoTime();
for (int i = 0; i < tiers[tierIndex]; i++) {
tests[testIndex].run(); //run the test
}
times[testIndex][tierIndex][r] = System.nanoTime() - startTime; //Stash the execution time in the array.
}
}
}
}
}
Looking at this code, you are probably wondering why the Test.run()
method calls a static method. That's because originally I was just embedding the hardcoded call to the static method inside of the loop. Instead of calling tests[testIndex].run()
I would call staticMethod1(quaternion)
explicitly there. That also meant redundantly duplicating this loop code for each method - which led me to creating the abstract base class so that I could essentially create an array of callbacks to test.
Observations
So, after switching to using the callback Test
class, instead of calling the static function explicitly, I observed two things:
- The execution times of each call increased greatly
- The execution times were not nearly as consistent.
Static Results
Here is a table of output from when I was calling the static methods explicitly (all times in nano seconds):
1000 10000 100000 1000000
staticMethod1
315358 424208 1451141 14495864
125334 410525 1483797 14657896
125956 412079 1445543 14702681
150837 413012 1473845 14677179
126578 412080 1419419 14415313
126268 413323 1450830 14600361
125645 411457 1437147 14504261
126889 414257 1431548 14402563
129689 420476 1476954 14548734
848417 425453 1471046 14409715
staticMethod2
1528581 1031287 3137712 2180755
228899 303540 218947 2227406
228276 303228 218324 2232071
235118 301362 218324 2174224
226411 317534 218636 2180133
226410 302918 218946 2143434
253779 302606 219879 2116689
226099 349257 219880 2108913
240717 303228 218946 2150899
250358 303228 218946 2159607
Callback Results
And here is the same code, only executed from my abstract base Test
class:
1000 10000 100000 1000000
staticMethod1
360453 454686 1985445 15699447
155191 449400 1639298 15048205
152391 449089 1576165 15128134
175095 451888 1537289 15300429
156746 466816 1600734 15190645
157989 464950 1641476 15483610
157368 452198 1559681 15316290
157990 460285 1572122 15402439
157367 527773 1538222 15078995
878274 454065 1548485 15077439
staticMethod2
1519562 1101263 1674130 8842756
274616 335883 1481309 8728930
285190 339616 1471046 8842135
291721 334950 1280089 8591155
294831 347391 1339491 13402065
332152 343970 1299683 10950426
300429 326553 1252100 7778814
285190 324999 1365615 8569385
297008 341792 1284133 7734030
283324 326554 1327984 11505256
Question Revisited
The only conjecture I was able to form was that perhaps this had something to do with the Java stack frame concept. I'm really looking for someone to give an in depth analysis of why this would happen.
What could be the result of these dramatically different results?
I re-ran the tests from my home computer, so I thought I would re-post those results for accuracy.
Callback (staticMethod1, staticMethod2)
1000 10000 100000 1000000
staticMethod1
629020 688864 3016204 40796517
348542 589891 2662401 39673949
355447 611921 3559403 39613447
416936 617511 4022701 39335929
412660 635267 4290355 38862108
409702 751996 4055583 38823967
405426 761202 4063803 38998238
410030 760545 4024016 39131407
411346 656640 3877366 38737489
1794991 723060 4139759 38286028
staticMethod2
2219818 4198617 2526272 15647236
735555 1939011 2651879 14482251
761860 445542 2480238 12990096
734569 222607 2437822 14278058
734898 264366 2323394 23561771
743118 220633 2739672 15669266
746734 224909 5159080 12916113
781918 223593 2342794 14843616
789481 229512 2740658 13400784
865108 227210 5202155 22015033
Callback (staticMethod2, staticMethod1)
1000 10000 100000 1000000
staticMethod2
2159974 1598690 4343951 4011522
755284 484013 4646131 3989491
779945 460667 390302 4114111
866752 469874 413318 3833963
911141 495193 433376 4024016
878918 468230 424827 4162118
892070 452447 431074 3830346
806579 419894 463298 4003301
814142 424169 424826 3961871
830253 417593 432718 3823112
staticMethod1
768437 632637 4596480 38401771
421539 655325 3811603 37536663
418579 657626 3917481 37377517
425813 648091 3887230 37347924
423512 653023 3800095 38334692
428772 570820 3810288 37640568
435020 581013 3795162 37543896
426800 578382 3805027 37390670
448830 567861 4004617 37502466
1883443 663874 3848101 38096961
Callback (staticMethod1, staticMethod2) with single instantiation of AxisAngle4f
1000 10000 100000 1000000
staticMethod1
693138 745420 4382752 26091003
405098 677355 3378227 41866476
390630 669793 4349213 42472807
430088 699057 4296931 27899147
385697 675711 4300549 42643790
382410 658941 4296603 32330563
393918 662888 2602557 42622417
380437 666833 2588747 32903026
393918 738515 2616367 26079823
1805843 679985 2570004 42191343
staticMethod2
444556 1640449 963422 8620168
463298 464942 946325 8545856
431732 474478 877931 8645487
452776 466915 870698 8761229
432718 449487 882534 8572490
443898 464613 876288 8482066
414633 538596 871684 8672121
408715 190054 876287 8626744
405427 96342 874643 8607016
436664 96343 847681 8543883
These results aren't really "dramatic." The averages for the 1000000 runs are:
The fact that the slowdown between these is off by (almost exactly) an order of magnitude is weird, but the big thing is that the callback approach slowed down by at most about 7 ns. That's not a whole lot.
UPDATE
More results you have posted, with the order of testing
method2
thenmethod1
, are consistent with the following:Test
class;The speed advantage of a monomorphic site is both in avoiding virtual function table lookup, and in allowing method inlining to happen. Both your methods are quite short and good candidates for inlining.
You could further verify that such recompiling happens once, but only once, by using
--XX:+PrintCompilation
combined with aprintln
at the moment you switch from one test to the other.Further notes
You should avoid your timings to be influenced by such intricacies of the JVM. To that end, use a
test(int iterations)
method and push your innermost loop into it. This will allow the critical method dispatch to happen only once per at least 1,000 iterations, and become insignificant;mistrust any time measurement below 1 ms.
System.nanoTime
's accuracy is nowhere close to 1 ns: its granularity is usually around 1 µs, and to get good accuracy you must be well above that.