This is quite a general question. I am quite new to all of this and I am having trouble deciding if something should be considered a batch job or a simple request to a microservice.
Let's say users can upload 30-second videos to S3 and we want to, for some reason, process every video such that the individual images of the video are extracted and stored in S3 again. Also, we do not really care about the latency, and minimizing compute costs, i.e., the cloud bill, is more important.
Would you use something like AWS Batch? Or would you deploy a set of microservices that do the processing by responding to API service calls, possibly with a messaging queue between S3 and the microservices?
Both options seem to do the job, but what is the right or correct way to approach such a problem?
I have read over https://docs.aws.amazon.com/batch/latest/userguide/best-practices.html. For short jobs, it says that you would need to merge jobs such that they run, ideally, 3 to 5 minutes each. However, it does not really talk about what should not be considered a batch job.
I feel like this question should be googleable, but I might just be missing the right terminology here.
This is a common architectural decision, and the choice between using AWS Batch and a microservices-based approach like Lambda depends on various factors such as:
AWS Batch:
Microservices:
As for your main concern:
Cost: If minimizing compute costs is a higher priority, AWS Batch might be more cost-effective for batch processing. Microservices may incur more operational costs.
Operational Overhead: Microservices require more management and operational effort than AWS Batch, which is a fully managed service.
Latency: If latency is not a concern, AWS Batch can process videos in parallel, but microservices can offer more control over handling immediate requests.
Ultimately, there's no one-size-fits-all answer, and it depends on your specific requirements. You might even consider a hybrid approach where you use AWS Batch for the bulk processing of videos and microservices for real-time or interactive processing, based on the characteristics of each video upload. So the reason it's a hard thing to Google is largely because it's very dependent on your specific requirements.