Stream benchmark returns impossible bandwidth

19 Views Asked by At

Specifications:

  • Intel i5-1035G1
  • Ubuntu 22.04.3 LTS (Dual boot)

Hi everyone!

I tried to make a stream benchmark on my computer for a couse, but the results I get for the bandwidth seem reeeaaaally too high.

First I did it in normal python, in a Jupyter notebook. The results were OK (~1GB/s for lists, and decreasing with length; 0.7GB/s for array.array, quite constant).

Anyway, here is not the problem.

Then I tried with Cython. And now, it says I have a bandwidth of 14GB/s (using a data size of 8 bytes). But I know that even if in theory it is possible, it's not right, because my code is not made for multi-threading (to be sure I ran htop during the benchmark and there was indeed only one core at 100% while the others were below 5%).

Please help my stupid ass

Cython file

#cython: boundscheck=False
import time
import numpy as np
cimport numpy as cnp

def stream(unsigned int STREAM_ARRAY_SIZE):
    cdef cnp.float64_t[:] a, b, c
    cdef double scalar

    a = np.ones(STREAM_ARRAY_SIZE, dtype=np.float64)
    b = np.ones(STREAM_ARRAY_SIZE, dtype=np.float64) * 2.0
    c = np.zeros(STREAM_ARRAY_SIZE, dtype=np.float64)
    scalar = 2.0

    times = [0, 0, 0, 0]
    timer = time.time_ns

    def copy():
        cdef unsigned int i
        times[0] = timer()
        for i in range(STREAM_ARRAY_SIZE):
            c[i] = a[i]
        times[0] = timer() - times[0]

    def scale():
        cdef unsigned int i
        times[1] = timer()
        for i in range(STREAM_ARRAY_SIZE):
            b[i] = scalar*c[i]
        times[1] = timer() - times[1]

    def add():
        cdef unsigned int i
        times[2] = timer()
        for i in range(STREAM_ARRAY_SIZE):
            c[i] = a[i]+b[i]
        times[2] = timer() - times[2]

    def triad():
        cdef unsigned int i
        times[3] = timer()
        for i in range(STREAM_ARRAY_SIZE):
            a[i] = b[i]+scalar*c[i]
        times[3] = timer() - times[3]

    copy()
    scale()
    add()
    triad()

    # Times are in ns, so without conversion, the calculation would be in GB/s
    return times

Python file:

import cythonstream
import matplotlib.pyplot as plt
import statistics


def bandwidth(STREAM_ARRAY_SIZE):
    times = cythonstream.stream(STREAM_ARRAY_SIZE)
    copy, scale, add, triad = times

    copy = (2 * 8 * STREAM_ARRAY_SIZE) / copy
    scale = (2 * 8 * STREAM_ARRAY_SIZE) / scale
    add = (3 * 8 * STREAM_ARRAY_SIZE) / add
    triad = (3 * 8 * STREAM_ARRAY_SIZE) / triad

    return copy, scale, add, triad


def plot(nb_experiment=5):
    """
    Plot the STREAM benchmark and makes an average on nb_experiment
    """
    WANTED_VALUES = [i for i in range(1000 * 1000, 50 * 1000 * 1000, 1000 * 1000)]

    copy_bandwidth, scale_bandwidth, add_bandwidth, triad_bandwidth = (
        [[] for _ in range(nb_experiment)],
        [[] for _ in range(nb_experiment)],
        [[] for _ in range(nb_experiment)],
        [[] for _ in range(nb_experiment)],
    )

    for i in range(nb_experiment):
        for value in WANTED_VALUES:
            copy, scale, add, triad = bandwidth(value)

            copy_bandwidth[i].append(copy)
            scale_bandwidth[i].append(scale)
            add_bandwidth[i].append(add)
            triad_bandwidth[i].append(triad)

    # Averages on nb_experiment and plot on WANTED_VALUES
    # ...


if __name__ == "__main__":
    plot()

I used these formulae from my course for the total amount of data that is moved for the different kernels:

copy -> 2 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE,
add -> 2 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE,
scale ->   3 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE,
triad ->   3 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE

PS: Finally the results I got with the Jupyter (for just python) are wrong, because I used python sys.getsizeof, which gives always 24 bytes for data length, instead of 8 when using the arrays of doubles for example. They should be even lower. Which increases my doubt for such a difference with Cython
0

There are 0 best solutions below