Upload 20 million images to s3 bucket using URL

293 Views Asked by At

I want to upload 20 million images to S3 bucket. I am using the following code.

s3.putObject(objectParams, (err, data) => {
            if (err) {
              reject(err);
            } else {
              resolve(data);
            }

I have cloudinary image URLs that I am using and I want to upload those images to S3.

What is the fastest way to do that? I am currently running this code sequentially but it's gonna take a lot of time. Can I run it faster using rest api?

Can anyone please help?

1

There are 1 best solutions below

0
On

Personally, I would do the following:

  • Write an AWS Lambda function that accepts URLs, then for each URL:
    • Download the file (Max temp storage = 512MB)
    • Upload the file to S3
    • Delete the local (temp) file
  • Create an Amazon SQS queue and configure it to trigger the Lambda function when messages are received (passing up to 10 messages at a time)
  • Write a small script to push the URLs into the SQS queue

This will start many Lambda functions in parallel (the default limit is 1000 concurrent functions), which will all be copying the files for you.

The only problem would be if the files are over 512MB, which is the size of temporary storage provided by Lambda.