C: Clarity needed on creating/using header files and Cython, Cythonize setup.py, MSVC, & GCC issues?

296 Views Asked by At

C: Clarity needed on Cython, Cythonize, setup.py/MSVC, GCC issues and creating/using header files

Hello all,

I am relatively new to Cython and very new to C but do have some programming experience with Python, Visual Basic and Java.

For my current project, my machine is running on Windows Pro 10 1909 x64, Python 3.7.9 x64 and Cython 0.29.21 and my ultimate goal is to create an EXE file with all the modules included.

I have not included any cdef statements or such like at this time and I plan to add these incrementally. Essentially what I am doing at the moment is at the proof-of-concept stage to show that I can compile and run current and future code without issues.

I have a __main__ module stored in the root project folder and my other (some very large) Python modules renamed as .pyx files which handle different types of files (.csv, .json, .html, .xml, etc) each with their own characteristics and method of extraction, stored in an 'includes' folder.

As I understand it, the header files contain function definitions which are then called upon as needed to act as a bridge between the subroutines and the main module. I have not created any header files at this time as I need clarity on a few points.

I am also having trouble Cythonizing with setup.py (setuptools) through MSVC and GCC.

Below are a discussion of the steps outlined so far to reach this point regarding setup.py, GCC and running directly from the prompt with my main questions at the end.


Step 1

My first attempt at compiling the code is to prepare a setup.py file from an elevated command prompt.

from setuptools import Extension, setup
from Cython.Build import cythonize

extensions = [
    Extension("main", ["__main__.pyx"],
    include_paths=r"C:\path\to\parsing_company_accounts_master_cython\includes"),
]
setup(
    name="iXBRLConnect",
    ext_modules=cythonize(extensions, compiler_directives={'language_level' : "3"}),
)

However, in Python 3.7.9 x64, I get the following output.

python setup.py bdist
running bdist
running bdist_dumb
running build
running build_ext
building 'main' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

I have this same error with all versions of Python and installing many variants of the build tools starting from 2015, when running in an elevated x64 Native Tools Command Prompt (any version of VS or standalone Build Tools).

Searching on this site points to many different SDKs, libraries and so forth that need to be added but after following many answers and numerous restarts I still am unable to get setup.py to run.

I CAN compile with VS Community Edition as a GUI, but all efforts seem to be confounded when using the command line (no other reason but for keeping a lean installation). It's not clear why the prompt route does not work.


Step 2

Not to be outdone, I attempt to install GCC - MinGW-w64 (https://wiki.python.org/moin/WindowsCompilers), an alternative compiler that is supported up to Python 3.4.

Noting that Python 3.4 is past end of life I uninstall Python 3.7.9 x64, install 3.4 and reinstall my pip site-packages.

However, installing BeautifulSoup4 gives me this message:

RuntimeError: Python 3.5 or later is required

I would take the EOL issue for Python 3.4 with a large pinch of salt but BS4 is a key library for my project so this is pretty much a showstopper.


Step 3

Finally, I attempt to build the files directly on the command line.

First, I move my other .pyx modules into "c:\path_with_spaces\to\includes" (9 in total), keeping __main__.pyx in the main project folder then run the next command from the project folder.

cython -3 --annotate --embed=main __main__.pyx --include-dir "c:\path_with_spaces\to\includes"

Questions

So, all the above said and done (phew!), here are the points I need clarity on: -

Q1: It seems to me that the 'include_paths'/'include-dir' arguments specify other additional directories only to create new C files - I presume the reason that this is because there are no header files alongside the existing *.pyx modules? [Initially, I naively thought Cython would automatically raise the headers and .c files? Instead, nothing at all - .c or .h - is generated for them.] Is there something wrong with my command line syntax for '--include_dirs' as the .c files should have been raised regardless and I just 'slot' the header files in? There is no error to say so. Or are the included files just meant to be read and no other action being taken on them, as you would expect from a library file?

Q2: As I continue to learn more It is increasingly clear that the header files need to be prepared in advance, according to this: https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html and this: http://hplgit.github.io/primer.html/doc/pub/cython/cython-readable.html However as far as I can ascertain from their examples (unless I am looking at the wrong thing), they only call their modules from the main module at some point. Taking the last link, I am not clear about 'dice6_cwrap' in the dice6_cwrap.pyx example (I think it should be referenced in the main module but it is not directly shown in this example). Also, may I also need other files perhaps a manifest of some sort?

Q3: In part answer to Q2, I think I can 'chain' modules together as explained here: How does chain of includes function in C++?? This is important to me because the way my code has worked up to now is to load each module (depending on what files are found) and then run through the modules in a 'chain' sequence to first parse all elements in a soup object, run through each line element and finally extract each attribute and insert them into a common database. In actual practice that can mean up to 8 'links' in total counting from the 'start' method in the submodule and depending on the attribute in question. FYI, some of the modules also include pandas, numpy and multiprocessing modules too. Thinking aloud - including header files, that means prepping 16 files? Eww! (BUT, with a little luck and fingers crossed - speed gains from C compilation vs Python interpretation...other bottlenecks permitting).

Apologies for my waffle, I welcome your thoughts on how I can move forward on this.

Thanks in advance.

0

There are 0 best solutions below