Gathering big amount of data from external API in AWS

73 Views Asked by At

I'm creating a serverless application in AWS that needs to store some data gathered trough an external API into a DynamoDB instance.

I'm trying to achieve it through Lambda functions, but the data are quite heavy, so everytime I try to perfom some data manipulation on them I get "timeout error", even if my timeout time is set to some minutes (if I run the same script on my computer it takes less than 10s to be executed). I tried to gather data and process them with json.loads() only but I still get timeout error.

Looking around the internet I saw there are quite few methods to pull data from an external API endpoint in AWS, like Glue or AppFlow.

My questions are:

  1. Is it a good choice using Lambdas for this type of task?
  2. What could cause my "timeout" problems?
  3. Do you suggest better alternatives to accomplish this task?

Thank you in advance

2

There are 2 best solutions below

0
On

My questions are:

Is it a good choice using Lambdas for this type of task?

Lambda has a maximum timeout of 15 minutes. If your work could take more than 15 mins then you can try a couple of things:

  1. Increase your Lambda memory
  2. Split the work across multiple Lambdas

What could cause my "timeout" problems?

Make sure the task isn't stuck in a loop, use adequate logging. Increase Lambda memory, Lambda should be able to match your local machine time.

Do you suggest better alternatives to accomplish this task?

It depends, if it's a large amount of data (GB+) then I would suggest using AWS Glue.

0
On

In addition to the points mentioned by Leeroy Hannigan, with respect to question 3:

You can see whether you can split the workload and leverage AWS Stepfunctions.

E.g. you can use a different set of Lambda functions to split, iterate, transform, and then load the data.