I sadly can't share the source code due to an NDA, but I think the question would still be interesting regardless.
Context
I have a file (my_cython_file.pyx
) which I sped up with cython with the usual:
my_cython_file.pyx -transpiling-> my_cython_file.c -compiling-> my_cython_file.so
Since I wanted to use some external C++ libraries, I decided to transpile my code to C++ intstead, with minimal changes to the pyx file:
my_cython_file.pyx -transpiling-> my_cython_file.cpp -compiling-> my_cython_file.so
Since the input file was essentially unchanged, I didn't think this would have any impact on the performance. However, the C++ version is about 20 times slower than the C version.
I tried to see if anyone had a similar experience online, and it seems like it could be related to the compiler flags. I've been playing around with the flags, but I don't have much experience with compilers and haven't managed to get very far
I'm using a really basic cythonize
call in my setup.py
file to carry out both compilation steps, just changing the language from C
to C++
and running python setup.py build_ext --inplace
. (I've pasted my setup.py
file at the end of the question)
C Compilation
running build_ext
building 'my_pkg.my_cython_file' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/path/to/my/venv/include -I/usr/include/python3.6m -c my_pkg/my_cython_file.c -o build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o
In file included from /usr/include/python3.6m/numpy/ndarraytypes.h:1809:0,
from /usr/include/python3.6m/numpy/ndarrayobject.h:18,
from /usr/include/python3.6m/numpy/arrayobject.h:4,
from my_pkg/my_cython_file.c:624:
/usr/include/python3.6m/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by " \
^~~~~~~
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o -o build/lib.linux-x86_64-3.6/my_pkg/my_cython_file.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/my_pkg/my_cython_file.cpython-36m-x86_64-linux-gnu.so -> my_pkg
C++ Compilation
building 'my_pkg.my_cython_file' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/path/to/my/venv/lib/python3.6/site-packages/numpy/core/include -I/usr/include/python3.6m -I/usr/src/algorithms/venv_dev/include/python3.6m -c my_pkg/my_cython_file.cpp -o build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o
In file included from /usr/include/python3.6m/numpy/ndarraytypes.h:1809:0,
from /usr/include/python3.6m/numpy/ndarrayobject.h:18,
from /usr/include/python3.6m/numpy/arrayobject.h:4,
from my_pkg/my_cython_file.cpp:638:
/usr/include/python3.6m/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by " \
^~~~~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/my_pkg/my_cython_file.o -o build/lib.linux-x86_64-3.6/my_pkg/my_cython_file.cpython-36m-x86_64-linux-gnu.so
I was playing around with the compilation flags using the extra_compile_args
in the setup.py
(see file at the end). However, those extra args get appended to the end so the log is slightly different:
x86_64-linux-gnu-gcc ......... -o /my_cython_file.o -std=c++14 -fopenmp -O3 -ffast-math
x86_64-linux-gnu-g++ -o my_cython_file.cpython-36m-x86_64-linux-gnu.so -std=c++14 -fopenmp -O3 -ffast-math
Questions
So the compilation steps look very similar... Could this really be the reason for the x20 runtime difference?
I'm pretty sure that the order of the compilation flags matters, so since cython is putting the
extra_compiler_args
at the end of the command, do they really have an effect? For instance, there's a-O1
right at the beginning of the command and cython adds my-O3
at the end. Which one has priority in this case?
x86_64-linux-gnu-g++ ... -O1 .... -o output.so -std=c++14 -fopenmp -O3 -ffast-math
^^^^ ^^^^
default from "extra_compiler_args"
- For the C++ compilation, I'm not really sure why cython is using
x86_64-linux-gnu-gcc
first (to build the.o
) and thenx86_64-linux-gnu-g++
afterwards (to build the.so
). Shouldn't it just useg++
for both? Or just rung++
once?
Appendix
Here is the C++ setup.py
, for reference
#! /usr/bin/env python3
# std imports
# from distutils.core import setup
from setuptools import setup, Extension, find_packages
import sys
import os
import numpy as np
from Cython.Build import cythonize
extensions = [Extension("*", ['my_pkg/*.pyx'],
extra_compile_args=['-std=c++14', '-fopenmp', '-O3', '-ffast-math'],
extra_link_args=['-std=c++14', '-fopenmp', '-O3', '-ffast-math'],
)]
setup(
name='my_pkg',
version='0.1.0',
author='me',
packages=find_packages(),
ext_modules=cythonize(extensions,
compiler_directives={'language_level': '3',},
gdb_debug=True,
annotate=True,
language='c++',
),
url='todo',
license='todo',
install_requires=requirements
)