Python and C++ script interaction

2.2k Views Asked by At

I am trying to optimize the interaction between two scripts I have. Two things I thought of are the c++ program not terminating unless you manually kill it, or generating all info in python before feeding it to c++.

Explanation of the problem:

What the scripts do: C++ program (not made by me, and I can't program in c++ very well): takes a 7 number array and returns a single number, simple. Python script (mine, and I can program a bit in python): generates those 7 number arrays, feeds them to the c++ program, waits for an answer and adds it to a list. It then makes the next array.

In theory, this works. However, as it is right now, it opens and closes the c++ program for each call. For one array that is no problem, but I'm trying to upscale to 25k arrays, and in the future to 6+ million arrays. Obviously it is then no longer feasible to open/close it each time, especially since the c++ program first has to load a 130mb VCD file to function.

Two options I thought of myself were to generate all arrays first in python, then feed them to the c++ program and then analyze all results. However, I wouldn't know how to do this with 6M arrays. It is not important however that the results I get back are in the same order as the arrays I feed in.

Second option I thought of was to make the c++ program not quit after each call. I can't program in c++ though so I don't know if this is possible, keeping it 'alive' so you can just feed arrays into it at times and get an answer.

(Note: I cannot program in anything else than python, and want to do this project in python. The c++ program cannot be translated to python for speed reasons.)

Thanks in advance, Max.

5

There are 5 best solutions below

0
On

I think you're doing it wrong

What the scripts do: C++ program (not made by me, and I can't program in c++ very well): takes a 7 number array and returns a single number, simple. Python script (mine, and I can program a bit in python): generates those 7 number arrays, feeds them to the c++ program, waits for an answer and adds it to a list. It then makes the next array.

You have this?

python generate_arrays.py | someC++app | python gather_array.py

This allows you to run the three parts in parallel, using every Core of every CPU on the box. The OS makes sure that all three run concurrently.

If you're still not getting 100% CPU Load, you'll have to do something like this.

( python generate_arrays.py --even | someC++app >oneFile ) & ( python generate_arrays.py --odd | someC++app > anotherFile )
python gather_array.py oneFile anotherFile

That will run two copies of python generate_arrays.py and two copies of your magical C++ program.

You'll have to rewrite your generate_arrays.py program so that it takes a command-line option. When the option is --even, you generate 3 million arrays. When the options is --odd you generate the other 3 million arrays.

This (python | c++) & (python | c++) should get to 100% cpu use.

2
On

Firstly, just to be pedantic, there are no C++ scripts in normal use. C++ compiles, ultimately, to machine code, and the C++ program is properly referred to as a "program" and not a "script".

But to answer your question, you could indeed set up the C++ program to stay in memory, where it listens for connections and sends responses to your Python script. You'd want to study Unix IPC, particularly sockets.

Another way to approach it would be to incorporate what the C++ program does into your Python script, and forget about C++ altogether.

2
On

Without the source code or the exact specifications of the Python script and the C++ program, it's difficult to provide more information, but you could modify the C++ code to repeatedly read the array from the standard input and then write the results to standard output.

Then you could use the Python subprocess module to launch the C++ program from your Python script and communicate with it.

Note that simply wrapping a loop around the main() function of the C++ program will not be very helpful, because apparently the main issue is the time the program needs in order to read its data (the VCD that you mentioned).

The loop needs to be strictly around the code that computes the result - which means that you may have to factor everything else out in a way that allows the result computation to be done repeatedly without each run contaminating the next ones.

0
On

Okay, your best course of action is probably to write a C/C++ extension to Python that is able to call the C++ code that does the calculation you want. This is not terribly difficult, it will only require a minimal amount of C/C++ coding to make it work. A good explanation of extending Python can be found on the Python page at http://docs.python.org/extending/extending.html

What you in effect do is change your C++ program to be a dynamic library that the Python process can link in and call from the Python script.

If you need a bit of help getting it to work I'm sure we can help you out.

2
On

I think the best way is to build C++ extension module for python.
There are lot of ways to do it.
If you have c++ sources you can try SWIG After that you can use c++ functions/object directly inside python - and manage them by python modules (here processing). It is really simple.