Why is sin slower in webassembly than in java script?

314 Views Asked by At

I have some very simple benchmark that is run via Catch2, and compiled with -O3 using emscripten 3.1.37:

    BENCHMARK("cpp sin only")
    {
        double sum = 1.0;
        for (int t = 0; t < 2000000; ++t) {
            sum += sin(double(t));
        }
        return sum;
    };

#ifdef __EMSCRIPTEN__
    BENCHMARK("js sin only")
    {
        EM_ASM_DOUBLE({
            let sum = 1;
            for (let i = 0; i < 2000000; i++) {
                sum = sum + Math.sin(i);
            }
            return sum;
        });
    };
#endif

I would expect, that there wouldn't be a large difference between JavaScript and WebAssembly, but there is:

chrome:
benchmark name                       samples       iterations    est run time
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
cpp sin only                                   100             1     7.93775 s 
                                        79.3856 ms     79.147 ms    79.7195 ms 
                                        1.43061 ms    1.10437 ms    1.97222 ms 
                                                                               
js sin only                                    100             1     2.21506 s 
                                        22.1354 ms    22.0064 ms       22.3 ms 
                                        742.138 us    614.746 us    901.128 us 

Native, compiled with GCC 12.3.0, i get 24.2ms.

  • To my understanding, JavaScript uses double precision floats for all numbers. So the comparison should be fair. When using float in the C++ version, it gets to 12ms in chrome, but that is still slower (and less precise). FF sits at around 30ms.
  • Maybe JavaScript uses a less precise, but faster implementation of sin and sqrt? Adding -fast-math doesn't increase performance for double. Float with fast-math in chrome becomes as fast as JavaScript, in FF it's still at around 30ms.
  • Is it, that WebAssembly isn't given as much time in the optimiser? That could explain, why it's so much slower in FF. But shouldn't emscripten take care of most of the optimisation?
  • Could it be some sort of protection against meltdown/spectre?

Update (I ran several additional benchmarks on a request in the comments):

  • g++12 is a bit faster than clang15, but that's within 10%
  • performance of sqrt is almost equal between the versions (before there was a sqrt and a sin in the example)
  • most of the time is spent in sin.
  • increasing the iteration count to 2 millions makes JS about 4x faster. increasing it to 5 millions increases the lead to 10x. JS is still about the same speed as C++ native.
  • note that the benchmark is execute 100 times by Catch2. The runtime of above code is around 1s.
  • I verified that it's not as simple as JS using float. The webassembly c++ computation matches the JS result exactly.
  • using 123456789042+1000000 increased the runtime by about 3-4x on gcc, clang native, webassembly c++ and js (the relative performance webassembly vs js stayed about the same).
  • reference, this is the code i used: https://pastebin.com/Mu2barB6 and here are the results for chrome: https://pastebin.com/Hbte7yRj

update 2:

After a comment by user21489919, I reported the issue to emscripten.

1

There are 1 best solutions below

6
user21489919 On

you might double check (ha!) that the long double version of sin() is not being invoked for some reason. It's not obvious why it would be from your code sample, but c++ std provides long double (128 bit in clang) versions of sin().

You could be double double sure that regular double is used by using .c instead of .cpp and #include <math.h>.