Is there an efficient way to use ffmpeg to perform a large quantity of cuts from a single file?

76 Views Asked by At

I'm trying to cut video files into smaller chunks. (each one being one word said in the video, so they're not all of equal size)

I've tried a lot of different approaches to try to be as efficient as possible, but I can't get the runtime to be under 2/3rd of the original video length. That's an issue because I'm trying to process 400+ hours of video.

Is there a more efficient way to do this ? Or am I doomed to run this for weeks ?

Here is the command for my best attempt so far

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -ss start_timestamp -t to_timestamp -i file_name -vf "fps=30,scale_cuda=1280:720" -c:v h264_nvenc -y output_file

Note that the machine running the code has a 4090 This command is then executed via python, which gives it the right timestamps and file paths for each smaller clip in a for loop

I think it's wasting a lot of time calling a new process each time, however I haven't been able to get better results with a split filter; but here's the ffmpeg-python code for that attempt:

Creation of the stream:

inp = (
    ffmpeg
    .input(file_name, hwaccel="cuda", hwaccel_output_format="cuda")
    .filter("fps",fps=30)
    .filter('scale_cuda', '1280','720')
    .filter_multi_output('split')
)

Which then gets called in a for loop

(
    ffmpeg
    .filter(inp, 'trim', start=row[1]['start'], end=row[1]['end'])
    .filter('setpts', 'PTS-STARTPTS')
    .output(output_file,vcodec='h264_nvenc')
    .run()
)
1

There are 1 best solutions below

16
kesh On

Do you know all the trims before running ffmpeg? If so, my best bet is to use a complex filtergraph with a split filter and a bunch of the trim-setpts filter chains in parallel to generate multiple outputs in a single FFmpeg run.

This will eliminate time to spawn a new FFmpeg subprocess each time and to seek video each time. Also, FFmpeg may utilize more threads. That being said, remember that the most likely bottleneck is the storage access. So, that could be the reason for you not meeting your performance goal.

How long does it take to process the file from the beginning to the end while outputting to null? If this doesn't run in < 2/3 duration, no optimization will help. Also, just out of curiosity, have you benchmarked CPU/GPU filtering approaches to see which one is faster by how much?