Why is Cython compiled to C much faster than the C++ equivalent

404 Views Asked by At

I sadly can't share the source code due to an NDA, but I think the question would still be interesting regardless.

Context

I have a file (my_cython_file.pyx) which I sped up with cython with the usual:

my_cython_file.pyx -transpiling-> my_cython_file.c -compiling-> my_cython_file.so

Since I wanted to use some external C++ libraries, I decided to transpile my code to C++ intstead, with minimal changes to the pyx file:

my_cython_file.pyx -transpiling-> my_cython_file.cpp -compiling-> my_cython_file.so

Since the input file was essentially unchanged, I didn't think this would have any impact on the performance. However, the C++ version is about 20 times slower than the C version.

I tried to see if anyone had a similar experience online, and it seems like it could be related to the compiler flags. I've been playing around with the flags, but I don't have much experience with compilers and haven't managed to get very far

I'm using a really basic cythonize call in my setup.py file to carry out both compilation steps, just changing the language from C to C++ and running python setup.py build_ext --inplace. (I've pasted my setup.py file at the end of the question)

C Compilation

running build_ext
building 'my_pkg.my_cython_file' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/path/to/my/venv/include -I/usr/include/python3.6m -c my_pkg/my_cython_file.c -o build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o
In file included from /usr/include/python3.6m/numpy/ndarraytypes.h:1809:0,
                 from /usr/include/python3.6m/numpy/ndarrayobject.h:18,
                 from /usr/include/python3.6m/numpy/arrayobject.h:4,
                 from my_pkg/my_cython_file.c:624:
/usr/include/python3.6m/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^~~~~~~
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o -o build/lib.linux-x86_64-3.6/my_pkg/my_cython_file.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/my_pkg/my_cython_file.cpython-36m-x86_64-linux-gnu.so -> my_pkg

C++ Compilation

building 'my_pkg.my_cython_file' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/path/to/my/venv/lib/python3.6/site-packages/numpy/core/include -I/usr/include/python3.6m -I/usr/src/algorithms/venv_dev/include/python3.6m -c my_pkg/my_cython_file.cpp -o build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o
In file included from /usr/include/python3.6m/numpy/ndarraytypes.h:1809:0,
                 from /usr/include/python3.6m/numpy/ndarrayobject.h:18,
                 from /usr/include/python3.6m/numpy/arrayobject.h:4,
                 from my_pkg/my_cython_file.cpp:638:
/usr/include/python3.6m/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^~~~~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o -o build/lib.linux-x86_64-3.6/my_pkg/my_cython_file.cpython-36m-x86_64-linux-gnu.so

I was playing around with the compilation flags using the extra_compile_args in the setup.py (see file at the end). However, those extra args get appended to the end so the log is slightly different:

x86_64-linux-gnu-gcc .........  -o /my_cython_file.o -std=c++14 -fopenmp -O3 -ffast-math

x86_64-linux-gnu-g++ -o my_cython_file.cpython-36m-x86_64-linux-gnu.so -std=c++14 -fopenmp -O3 -ffast-math

Questions

  • So the compilation steps look very similar... Could this really be the reason for the x20 runtime difference?

  • I'm pretty sure that the order of the compilation flags matters, so since cython is putting the extra_compiler_args at the end of the command, do they really have an effect? For instance, there's a -O1 right at the beginning of the command and cython adds my -O3 at the end. Which one has priority in this case?

x86_64-linux-gnu-g++ ... -O1 .... -o output.so -std=c++14 -fopenmp -O3 -ffast-math
                         ^^^^                                      ^^^^
                        default                                   from "extra_compiler_args"
  • For the C++ compilation, I'm not really sure why cython is using x86_64-linux-gnu-gcc first (to build the .o) and then x86_64-linux-gnu-g++ afterwards (to build the .so). Shouldn't it just use g++ for both? Or just run g++ once?

Appendix

Here is the C++ setup.py, for reference

#! /usr/bin/env python3

# std imports
# from distutils.core import setup
from setuptools import setup, Extension, find_packages
import sys
import os
import numpy as np
from Cython.Build import cythonize


extensions = [Extension("*", ['my_pkg/*.pyx'],
                        extra_compile_args=['-std=c++14', '-fopenmp', '-O3', '-ffast-math'],
                        extra_link_args=['-std=c++14', '-fopenmp', '-O3', '-ffast-math'],
                        )]

setup(
    name='my_pkg',
    version='0.1.0',
    author='me',
    packages=find_packages(),
    ext_modules=cythonize(extensions,
                          compiler_directives={'language_level': '3',},
                          gdb_debug=True,
                          annotate=True,
                          language='c++',
                          ),
    url='todo',
    license='todo',
    install_requires=requirements
)

0

There are 0 best solutions below