I am currently unpacking an encrypted file from the software I use to get an image (2048x2048) from a file (along with other information). I'm currently able to do this but it takes about 1.7 seconds to load. Normally this would be fine but I'm loading 40ish images at each iteration and my next step in this simulation is to add more iterations. I've been trying to use JIT interpreters like pypy and numba. The code below is just one function in a larger object but it's where I'm having the most timelag.
Pypy works but when I call my numpy functions, it takes twice as long. So I tried using numba, but it doesn't seem to like unpack. I tried using numba within pypy, but that also seems to not work. My code goes a bit like this
from struct import unpack
import numpy as np
def read_file(filename: str, nx: int, ny: int) -> tuple:
f = open(filename, "rb")
raw = [unpack('d', f.read(8))[0] for _ in range(2*nx*ny)] #Creates 1D list
real_image = np.asarray(raw[0::2]).reshape(nx,ny) #Every other point is the real part of the image
imaginary_image = np.asarray(raw[1::2]).reshape(nx,ny) #Every other point +1 is imaginary part of image
return real_image, imaginary_image
In my normal python interpreter, the raw line takes about 1.7 seconds and the rest are <0.5 seconds.
If I comment out the numpy lines and just unpack in pypy, the raw operation takes about 0.3 seconds. However, if I perform the the reshaping operations, it takes a lot longer (I know it has to do with fact that numpy is optimized in C and will take longer to convert).
So I just discovered numba and thought I'd give it a try by going back to my normal python interpreter (CPython?). If I add the @njit or @vectorize decorators to the function I get the following error message
File c:\Users\MyName\Anaconda3\envs\myenv\Lib\site-packages\numba\core\dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
464 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
465 f"by the following argument(s):\n{args_str}\n")
466 e.patch_message(msg)
--> 468 error_rewrite(e, 'typing')
469 except errors.UnsupportedError as e:
470 # Something unsupported is present in the user code, add help info
471 error_rewrite(e, 'unsupported_error')
File c:\Users\MyName\Anaconda3\envs\myenv\Lib\site-packages\numba\core\dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
407 raise e
408 else:
--> 409 raise e.with_traceback(None)
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'unpack': Cannot determine Numba type of <class 'builtin_function_or_method'>
I may be reading this error message wrong but it seems that Numba does not like built in functions? I haven't looked into any of the other options like Cython. Is there some way to make Numba or pypy work? I'm mostly interested in speeding this operation up so I'd be very interested to know what people think is the best option. I'd be willing to explore optmizing in C++ but I'm not aware of how to link the two
Issuing tons of
.read(8)calls and many smallunpackings is dramatically increasing your overhead, with limited benefit. If you weren't usingnumpyalready, I'd point you to preconstructing an instance ofstruct.Structand/or using.iter_unpackto dramatically reduce the costs of looking upStructs to use for unpacking, and replacing a bunch of tinyreadcalls with a bulkread(you need all the data in memory anyway), but since you're usingnumpy, you can have it do all the work for you much more easily:That replaces a bunch of relatively slow Python level manipulation with a very fast:
numpyarray (it doesn't even unpack it properly, it just interprets it in-place as being of the expected type, which defaults to "float", actually Cdoubles)No need for
numba; on my local box, for a 2048x2048 call, your code took ~1.75 seconds, this version took ~10 milliseconds.