We want to create a dynamic flow based on input data in S3. Based on data available in S3 and along with meta data we want to create dynamic clusters and dynamic tasks/transformation jobs in the system. And Some jobs are dependency based. Here I am sharing the expected flow, want to know how efficiently we can do this using AWS services and env.
I am exploring AWS SWF, Data Pipe Line and Lambda. But now sure how to take care of dynamic tasks and dynamic dependencies. Any thoughts around this.
Data Flow is explained in the attached image (refer ETL Flow) ETL Flow
Amazon Step Functions with S3 Triggers should get the job done in a cost effective and scalable manner.
All Steps are defined with state language.
https://states-language.net/spec.html
You can run jobs in parallel and wait for them to finish before you start your next job.
Below is one of the sample from AWS Step Functions,