Can one speed up this Chez Scheme microbenchmark?

518 Views Asked by At

This double loop is 50 times slower in Chez Scheme than in C++ (compiled with --optimize-level 3 and -O3, respectively)

(import
  (rnrs)
  (rnrs r5rs))


(let* ((n (* 1024 16))
       (a (make-vector n))
       (acc 0))

  (do ((i 0 (+ i 1)))
    ((= i n) #f)
    (vector-set! a i (cons (cos i) (sin i))))

  (do ((i 0 (+ i 1)))
    ((= i n) #f)
    (do ((j 0 (+ j 1)))
      ((= j n) #f)
      (let ((ai (vector-ref a i))
            (aj (vector-ref a j)))
        (set! acc (+ acc (+ (* (car ai) (cdr aj))
                            (* (cdr ai) (car aj))))))))

  (write acc)
  (newline))

(exit)

vs

#include <iostream>
#include <cmath>
#include <vector>
#include <algorithm>

typedef std::pair<double, double> pr;

typedef std::vector<pr> vec;

double loop(const vec& a)
{
    double acc = 0;
    const int n = a.size();

    for(int i = 0; i < n; ++i)
        for(int j = 0; j < n; ++j)
        {
            const pr& ai = a[i];
            const pr& aj = a[j];
            acc += ai .first * aj.second + 
                   ai.second * aj .first;
        }

    return acc;
}

int main()
{
    const int n = 1024 * 16;

    vec v(n);

    for(int i = 0; i < n; ++i)
        v[i] = pr(std::cos(i), std::sin(i));

    std::cout << loop(v) << std::endl;
}

I realize that there is more memory indirection in Scheme than in C++ here, but still the performance difference is surprising...

Is there a simple way to speed up the Scheme version? (Without changing the memory layout to something totally unidiomatic)

1

There are 1 best solutions below

5
On BEST ANSWER

So while these programs do look the same they are not the same. You are using fixnum arithmetic in the C version while the Scheme version uses the standard numeric tower. To make a C version more like Scheme try using a bignum library for your calculations.

As a test I replaced the arithmetic with (rnrs arithmetic flonums) and (rnrs arithmetic fixnums) and it halved the execution time in DrRacket. I expect the same to happen in any implementation.

Now my initial tests showed that the C code executed about 25 times faster and not 50 as expected and by changing to floating point arithmetic I'm down to C being about 15 times faster.

I think I can make it even faster by using unsafe procedures, since Scheme does check the type of each argument at runtime it does operations before each procedure which doesn't happen in the C version. As a test I changed it to use unsafe procedures in my implementation and now it's only 10 times slower.

Hope it helps also in Chez :)

EDIT

Here is my modified source which improves the speed 2 times:

#!r6rs
(import
 (rnrs)
 ;; import the * and + that only work on floats (which are faster, but they still check their arguments)
 (only (rnrs arithmetic flonums) fl+ fl*))


(let* ((n (* 1024 16))
       (a (make-vector n))
       (acc 0.0)) ; We want float, lets tell Scheme about that!

  ;; using inexact f instead of integer i
  ;; makes every result of cos and sin inexact
  (do ((i 0 (+ i 1))
       (f 0.0 (+ f 1)))
    ((= i n) #f)
    (vector-set! a i (cons (cos f) (sin f))))

  (do ((i 0 (+ i 1)))
    ((= i n) #f)
    (do ((j 0 (+ j 1)))
      ((= j n) #f)
      (let ((ai (vector-ref a i))
            (aj (vector-ref a j)))
        ;; use float versions of + and *
        ;; since this is where most of the time is used
        (set! acc (fl+ acc
                       (fl+ (fl* (car ai) (cdr aj))
                            (fl* (cdr ai) (car aj))))))))

  (write acc)
  (newline))

And the implementation specific (lock in) just to tell that the type checking done in runtime does have an impact this code runs 30% faster than the previous optimization:

#lang racket

;; this imports import the * and + for floats as unsafe-fl* etc. 
(require racket/unsafe/ops)

(let* ((n (* 1024 16))
       (a (make-vector n))
       (acc 0.0)) ; We want float, lets tell Scheme about that!

  (do ((i 0 (+ i 1))
       (f 0.0 (+ f 1)))
    ((= i n) #f)
    ;; using inexact f instead of integer i
    ;; makes every result of cos and sin inexact
    (vector-set! a i (cons (cos f) (sin f))))

  (do ((i 0 (+ i 1)))
    ((= i n) #f)
    (do ((j 0 (+ j 1)))
      ((= j n) #f)
      ;; We guarantee argument is a vector
      ;; and nothing wrong will happen using unsafe accessors
      (let ((ai (unsafe-vector-ref a i))
            (aj (unsafe-vector-ref a j)))
        ;; use unsafe float versions of + and *
        ;; since this is where most of the time is used
        ;; also use unsafe car/cdr as we guarantee the argument is
        ;; a pair.
        (set! acc (unsafe-fl+ acc
                              (unsafe-fl+ (unsafe-fl* (unsafe-car ai) (unsafe-cdr aj))
                                          (unsafe-fl* (unsafe-cdr ai) (unsafe-car aj))))))))

  (write acc)
  (newline))

I have made an effort to keep the style of the original code. It's not very idiomatic Scheme. Eg. I would't have used set! at all, but it does not affect the speed.