This double loop is 50 times slower in Chez Scheme than in C++ (compiled with --optimize-level 3
and -O3
, respectively)
(import
(rnrs)
(rnrs r5rs))
(let* ((n (* 1024 16))
(a (make-vector n))
(acc 0))
(do ((i 0 (+ i 1)))
((= i n) #f)
(vector-set! a i (cons (cos i) (sin i))))
(do ((i 0 (+ i 1)))
((= i n) #f)
(do ((j 0 (+ j 1)))
((= j n) #f)
(let ((ai (vector-ref a i))
(aj (vector-ref a j)))
(set! acc (+ acc (+ (* (car ai) (cdr aj))
(* (cdr ai) (car aj))))))))
(write acc)
(newline))
(exit)
vs
#include <iostream>
#include <cmath>
#include <vector>
#include <algorithm>
typedef std::pair<double, double> pr;
typedef std::vector<pr> vec;
double loop(const vec& a)
{
double acc = 0;
const int n = a.size();
for(int i = 0; i < n; ++i)
for(int j = 0; j < n; ++j)
{
const pr& ai = a[i];
const pr& aj = a[j];
acc += ai .first * aj.second +
ai.second * aj .first;
}
return acc;
}
int main()
{
const int n = 1024 * 16;
vec v(n);
for(int i = 0; i < n; ++i)
v[i] = pr(std::cos(i), std::sin(i));
std::cout << loop(v) << std::endl;
}
I realize that there is more memory indirection in Scheme than in C++ here, but still the performance difference is surprising...
Is there a simple way to speed up the Scheme version? (Without changing the memory layout to something totally unidiomatic)
So while these programs do look the same they are not the same. You are using fixnum arithmetic in the C version while the Scheme version uses the standard numeric tower. To make a C version more like Scheme try using a bignum library for your calculations.
As a test I replaced the arithmetic with
(rnrs arithmetic flonums)
and(rnrs arithmetic fixnums)
and it halved the execution time in DrRacket. I expect the same to happen in any implementation.Now my initial tests showed that the C code executed about 25 times faster and not 50 as expected and by changing to floating point arithmetic I'm down to C being about 15 times faster.
I think I can make it even faster by using unsafe procedures, since Scheme does check the type of each argument at runtime it does operations before each procedure which doesn't happen in the C version. As a test I changed it to use unsafe procedures in my implementation and now it's only 10 times slower.
Hope it helps also in Chez :)
EDIT
Here is my modified source which improves the speed 2 times:
And the implementation specific (lock in) just to tell that the type checking done in runtime does have an impact this code runs 30% faster than the previous optimization:
I have made an effort to keep the style of the original code. It's not very idiomatic Scheme. Eg. I would't have used
set!
at all, but it does not affect the speed.