I'm a self taught python programmer working on a hobby project, but I'm having some difficulty and would like to address what I see as a potential XY problem.
My app takes an input of an audio file (converts it to wav) and produces visual representations of the audio (90x90, RGB, frames) in the form of numpy arrays. I used to save these frames to a video file using open-cv, then use ffmpeg to scale the video and add the (original, non-wav) audio over the top, but this meant waiting until the app had finished to play the file. I would like to be able to play the audio and display the frames as they are generated, in sync. My generation code takes at maximum 8ms of a 16ms frame (60fps), so I have a reasonable amount of cycles to play with.
From my research, I have found that SDL is the tool that is most appropriate to display frames at high speeds, and have managed to make a simple system to display frames 'in time', by brute-force pixel editing. I have also discovered that SDL can play audio, and it even seems that I could synchronize this with the video as I would like, via the callback function. However, being a decidedly non-c programmer, I am at a loss as to how to best to display frames, as directly assigning pixels cannot be the safest or fastest, and I would like to scale the frames as the are displayed. I am also at a loss as to how best to convert numpy arrays to textures efficiently, as well as how best to control the synchronicity of my generation code, the audio, and video frames.
I'm not specifically looking for an answer to any of those problems, though advice would be appreciated, I'm just making sure that this is a reasonable way forward. Is SDL/pysdl2 coupled with numpy appropriate in this scenario? Or is this asking too much from python overall?