DISCLAIMER: I am pretty new to video processing and multimedia in general, so I am not really familiar with the terminology, sorry if I use some terms incorrectly.
A few weeks ago I faced a problem related to video transmission: how to transfer video from CCTV camera (at a pretty remote place in the mountains) to a private server, using 3G network (due to the location of the camera, rarely 4G).
The camera is connected to a Raspberry Pi, which is connected to the internet via the 3G router, this is important - the Raspberry Pi reads the RTSP stream from the camera and retranslates it to the server. This is done because I needed the Raspberry Pi (or the camera's side in general) to initiate the whole process of streaming, because the 3G/4G mobile operator couldn't provide me a static IP address to connect to from the server. However, this behavioural model allows me to process every frame before sending it.
The main thing to focus on was the size of video portions (or slices, or pieces) to be as small as possible, so they could take less bandwith, therefore, faster transfer, therefore lower latency in the final video.
This is when I dcided to use OpenCV on both ends - the Raspberry Pi and the server. When the Raspberry Pi reads a frame from the camera, it compares it to the previous frame it got and only sends to the server the difference between the two (if this is the first frame - no actions are taken and the whole frame is sent to the server). On the server's end the difference is taken, applied to the previous frame and a new frame is generated (which on the next request becomes a previous frame).
This is the code I'm using for that, and as it is a CCTV footage, pretty still, with low movement on the picture, this method works:
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
prevFrame = frame
# Send prevFrame with zeromq
while cap.isOpened():
ret, frame = cap.read()
if ret is True:
diff = cv2.absdiff(frame, prevFrame)
mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
th = 10
imask = mask > th
canvas = np.zeros_like(frame, np.uint8)
canvas[imask] = frame[imask]
# Send canvas[imask] with zeromq
prevFrame = frame
else:
break
cap.release()
When, before sending the canvas[imask]
I print it's size in bytes I get somewhere around 100 KB to 300 KB, and on some frames as low as 9 KB.
This is a part of the video I am translating (basically cv2.imshow('frame', frame)
):
And this is the same part showing ONLY the changes from the previous frame (cv2.imshow('canvas', canvas)
):
My question is: I want to write the video to disk with such approach in mind: writing the first frame to a file and then appending to this file only the changes from the previous frame. Later on, for a player, I will use the script I'm currently using on the server's side to read the changes and apply them to the previous frame, so a video would be present. This way I hope to save on disk space while kepping video archives. When I do this as a text file (even if I encode it in binary) the final file results in a bigger size than the original video. Is there a way to achieve this, what approach should I take? Is that already done by someone and does it make sense at all? If not, what advice would you give me so I could spend less disk space on video archiving?
This answer suggests that video compression works this way, but as I said, I am pretty new to that field, so I would be glad if you could provide some guidelines and explanation on this topic.
NB: While creating the above GIFs I noticed that the second one, containing only the changes is significantly smaller in size (690 KB) than the first one (12.9 MB), which leads me to some thoughts that I'm on the right path.