Usually, we store our code on github, and then deploy it on AWS lambda.
We are now challenged with a specific Node.js script.
- it takes roughly an hour to run, we can't deploy it on a lambda because of that.
- it needs to run just once a month.
- once in a while we'll update the script in our github repository, and we want the script in AWS to stay in sync if we make changes (e.g. using a pipeline)
- this scripts copies files from S3 and processes them locally. It does some heavy lifting with data.
What would be the recommended way to set this up on AWS ?
The serverless approach fits nicely since you will run the work only once per month. Data transfer between Lambda and S3 (in the same region) is free. If Lambda is comfortable for your use case except for execution time constraints and you can "track the progress" of the processing, you can create a state machine that will invoke your lambda as a step function in the loop while you will not process all S3 data chunks. Each lambda execution can take up to 15 minutes and state machine execution time is way beyond 1 hour. Regarding ops, you can have a trigger on your GitHub that will publish a new version of the lambda. You can use AWS CloudFormation, CDK or any other suitable tool for that.