A simple matrix benchmark test has indicated that Revolution Analytics R 2.13.2
's
LU decomposition is nearly 5 times slower than matrix multiplication.
Theory and many years of practice show that LU should be 1/3 to 2/3 the time for A*A
.
Revo R and Matlab are using Intel's Math Kernel for this test.
R 2.14.1
is not using a kernel. Everything is 64-bit.
The anomaly shows up in Table 2 below. This is Table 1 normalized about A*A
. There are other (apparent) anomalies but LU is the most glaring one.
Table 1 (secs)
A*A LU A\b Det Inv
----------------------------------------------------
R 2.14.1 0.757 0.43 0.45 0.20 1.11
Revo R 2.13.2 0.063 0.35 0.11 0.03 0.14
Matlab 2011b 0.062 0.08 0.10 0.07 0.16
----------------------------------------------------
Averaged over 20 runs on a 1000x1000 random matrix
Table 2 (normalized)
A*A LU A\b Det Inv
----------------------------------------------------
R 2.14.1 1 0.57 0.19 0.26 1.47
Revol R 2.13.2 1 4.67* 1.58 1.33 2.17
Matlab 2011b 1 0.67 1.72 0.61 1.68
----------------------------------------------------
Note: x = A\b in Matlab is x <- solve(A,b) in R.
UPDATE: I have followed Simon Urbanek's advice and replaced LUP = expand(lu(Matrix(A)));
with lu(A)
; The Revo R rows are now
Revol R 2.13.2
A*A LU A\b Det Inv
---------------------------------
time 0.104 0.107 0.110 0.042 0.231
norm time 1.000 1.034 1.060 0.401 2.232
Time in seconds on a
Dell Precision 690, 2 x Intel® Xeon® E53405 CPU @ 2.33GHz,
16GB ram, 2 Processors, 8 Cores and 8 Threads,
Windows 7 Prof., 64-bit
A work-in-progress report that contains tables and the code used are here.
UPDATE 2:
I have modified the matrix benchmark to test Matrix Decompositions only. These are the foundations on which all other matrix algorithms are built, and if these are shaky then all other algorithms will be shaky too.
I have changed to a brand new
Lenovo ThinkPad X220, Intel Core i7-2640M CPU @ 2.80GHz,
8GB ram, 1 Processor, 2 Cores and 4 Threads
Windows 7 Professional, 64-bit.
Note: The Core i7
processor has Intel's Turbo Boost which increases the clock rate up to 3.5GHz if it senses a high demand. As far as I know, Turbo Boost is not under program(mer) control in any of the three systems.
These changes will, I hope, make the results more useful.
Table 3. Times(secs)
A*A chol(A) lu(A) qr(A) svd(A) eig(A) Total
-----------------------------------------------------------------------------
R 2.14.1 0.904 0.157 0.260 0.568 4.260 6.967 13.11
Revol R 2.13.2 0.121 0.029 0.062 0.411 1.623 3.265 5.51
Matlab 2011b 0.061 0.014 0.033 0.056 0.342 0.963 1.47
-----------------------------------------------------------------------------
Times(secs) averaged over 20 runs
Table 4. Times(normalized)
A*A chol(A) lu(A) qr(A) svd(A) eig(A) Total
----------------------------------------------------------------------------
R 2.14.1 1.000 0.174 0.288 0.628 4.714 7.711 14.52
Revol R 2.13.2 1.000 0.237 0.515 3.411 13.469 27.095 45.73
Matlab 2011b 1.000 0.260 0.610 0.967 5.768 16.774 25.38
----------------------------------------------------------------------------
Times(secs) averaged over 20 runs
We can see from Table 4 that the spurious anomaly has disappeared and that all system behave as the theory predicts.
Table 5. Times/Matlab Times
A*A chol(A) lu(A) qr(A) svd(A) eig(A) Total
----------------------------------------------------------------------------
R 2.14.1 15 11 8 10 12 7 9
Revol R 2.13.2 2 2 2 7 5 3 4
----------------------------------------------------------------------------
Rounded to the nearest integer
FWIW you're not measuring the time of the decomposition, but the overhead you create by converting the matrices all the time. In regular R:
so chances are you're creating the overhead yourself. This matters in particular when the decomposition is fast. Also note that taking about performance of "R" is ambiguous since the performance of R itself varies strongly depending on which BLAS you use.
(As a side note - you may want to use
system.time
in your benchmarks - also of interest to you could be themicrobenchmark
R package)