Batch workloads vs calls to microservices

147 Views Asked by At

This is quite a general question. I am quite new to all of this and I am having trouble deciding if something should be considered a batch job or a simple request to a microservice.

Let's say users can upload 30-second videos to S3 and we want to, for some reason, process every video such that the individual images of the video are extracted and stored in S3 again. Also, we do not really care about the latency, and minimizing compute costs, i.e., the cloud bill, is more important.

Would you use something like AWS Batch? Or would you deploy a set of microservices that do the processing by responding to API service calls, possibly with a messaging queue between S3 and the microservices?

Both options seem to do the job, but what is the right or correct way to approach such a problem?

I have read over https://docs.aws.amazon.com/batch/latest/userguide/best-practices.html. For short jobs, it says that you would need to merge jobs such that they run, ideally, 3 to 5 minutes each. However, it does not really talk about what should not be considered a batch job.

I feel like this question should be googleable, but I might just be missing the right terminology here.

1

There are 1 best solutions below

1
On

This is a common architectural decision, and the choice between using AWS Batch and a microservices-based approach like Lambda depends on various factors such as:

  1. AWS Batch:

    • Batch Processing: If your primary goal is to process a large number of videos in a batch-like manner without much concern for real-time processing or low latency, AWS Batch can be a suitable choice. It's designed for batch workloads and is cost-effective for such tasks.
    • Scalability: AWS Batch can efficiently handle scaling for processing a high volume of videos simultaneously.
    • Simplicity: It's a managed service that can simplify job scheduling, execution, and resource management for batch jobs.
  2. Microservices:

    • Real-time Processing: If you need immediate processing and real-time responses for each video upload, microservices with API endpoints may be more appropriate. This is useful if, for example, you want to provide immediate feedback to users or need to take different actions based on video content.
    • Custom Logic: Microservices offer flexibility to implement custom logic and business rules alongside the video processing. You can also handle different types of video processing requests.
    • Scalability: You can scale microservices individually, which may be helpful if you have varying processing demands for different types of videos.

As for your main concern:

  • Cost: If minimizing compute costs is a higher priority, AWS Batch might be more cost-effective for batch processing. Microservices may incur more operational costs.

  • Operational Overhead: Microservices require more management and operational effort than AWS Batch, which is a fully managed service.

  • Latency: If latency is not a concern, AWS Batch can process videos in parallel, but microservices can offer more control over handling immediate requests.

Ultimately, there's no one-size-fits-all answer, and it depends on your specific requirements. You might even consider a hybrid approach where you use AWS Batch for the bulk processing of videos and microservices for real-time or interactive processing, based on the characteristics of each video upload. So the reason it's a hard thing to Google is largely because it's very dependent on your specific requirements.