I'm currently transcoding user uploaded videos in a linode server and using s3 to store them, but, being the optimist I am, I want to move the transcoding to Amazon EC2 to scale up if necessary, also to gain experience from using it. This is my work flow so far:
- Upload videos directly to s3, bypassing nginx/Rails stack
- Create an entry in Amazon simple queue service with video url
- See if any instances of EC2 are running, if not start one with a simple ruby script as startup script (I believe Amazon allows startup scripts under 6kb)
- Ruby script connects to simple queue and runs the ffmpeg command to transcode the video
- Script sees if there are any additional queues and runs them
- Shutdowns the instance when there are no more queues (might make it run the full hour since Amazon charges for full hour for any part of the hour EC instance is running)
Obviously the above is basic and doesn't use EC2 to its full purpose. I've thought about using threads to connect to queue and run new jobs on same EC2 instance or create additional EC2 instances, where the startup script will run the ruby script. With the former we'll need to limit number of jobs in same EC2 based on cpu usage.
The latter seems a waste, but given video transcoding is cpu intensive, maybe two ffmpegs at same time is not feasible. I've also thought about Amazon's auto scaling to create new instances, but using Ruby seems simpler and easier to me.
- My question is what is the best way to do this efficiently?
- Is running two ffmpegs considered bad practice? (Let's assume the video size will be around 200mbs average.)
- Is using Ruby threads optimal given the bad name they seem to have?
- Should I look into EventMachine in place of threads?
I don't want to run any EC2 instances unless necessary, get the maximum juice out of the instances, but not let my users wait too long for their videos to be transcoded.
Based on this article http://stream0.org/2009/10/h264-video-encoding-on-amazons.htmlhttp://stream0.org/2009/10/h264-video-encoding-on-amazons.html High-CPU, Extra Large Instance seems to be the best option. Of course I plan to do some of my own testing, but I wanted to get some expert opinion before I dive in. Thanks!
This turned out to be an essay, sorry for the length.
I ran some tests today on my local machine to test CPU usage with multiple ffmpeg processes. I found the following command on the Internet and so far it works decently, it encodes into flv, reduces file sizes without noticeable difference in quality. I know next to nothing about ffmpeg so maybe the command is total crap (please let me know if it is). One problem is it doesn't support threads in ffmpeg, but I think that might be a codec thing.
I used
top -b -d 0.5
in half second intervals to measure CPU usage and didgrep Cpu
to get relevant info. The files were about 150mbs in size and were encoded with the same ffmpeg command. I let the process run a little before starting a new one and here are my results:Based on the data, converting videos one by one is gross under usage of resources, also ffmpeg processes are pretty stable, minus couple spikes. 4 instances of ffmpeg, at least for my machine, seems to be most efficient.
I seem to be having trouble running two ffmpeg process parallel using
Thread
orfork
using system command.Does anyone have any thoughts on this? Especially how to run two ffmpeg process using a Ruby script?