AWS S3 PHP SDK Batch Upload Thousands of Files?

1.2k Views Asked by At

I'm looping through my image files and calling $s3Client->putObject($object); for each one, but with hundreds of thousands of files, I can see this is going to take forever.

I can't seem to find any batch upload examples for the PHP SDK. Does the PHP SDK provide some kind of batch upload method that will process them more efficiently?

1

There are 1 best solutions below

0
On

Short answer is - No, there is no batch upload to AWS S3 bucket but for better performance you can use concurrency while you are uploading your stuff to the bucket.

I've made a really simple POC for your issue using concurrency for uploading data to the bucket. First of all because of missing information I've made my poc with latest version of AWS PHP SDK available at this time - 3.171.2 also you will need to install Guzzle if you don't have it. I'm using Guzzle 7.2 but I think this code will work on Guzzle 6+ too.

For my poc I'm using putObjectAsync but you can even try to handle it with Multipart Uploads

My setup:
- php: 7.4.13
- guzzle: 7.2.0
- aws-sdk-php: 3.171.2

$files = glob('/path/to/your/files/*'); // This will return an array of all files in your folder

try {
    // Init of S3 client
    $s3Client = new \Aws\S3\S3Client([
        'version'       => 'latest',
        'region'        => '', // Desired AWS region
        'credentials'   => [
            'key'       => '', // Your AWS key
            'secret'    => '', // Your AWS key secret
        ],
    ]);
    
    // Logic about your requests and how to execute them
    $uploads = function($files) use ($s3Client) {
        foreach ($files as $file) {
            yield $s3Client->putObjectAsync([
                'Bucket'        => '', // Name of your bucket
                'Key'           => basename($file),
                'SourceFile'    => $file,
            ]);
        }
    };
    
    // Execute your requests with Guzzle because $s3Client->putObjectAsync() returns \GuzzleHttp\Promise\Promise
    \GuzzleHttp\Promise\Each::ofLimit(
        $uploads($files),
        3, // How much concurrent request to start
        function($response, $index) { // Callback on success
            var_dump('Success: ' . $index); 
        },
        function($reason, $index) { // Callback on failure
            var_dump('Error: ' . $index);
        }
    )->wait();
} catch (\Aws\S3\Exception\S3Exception $e) {
    var_dump($e->getMessage());
}

Expected output will be something like this:

/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 2' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 0' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 1' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 4' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 6' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 7' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 5' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 9' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 3' (length=10)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 10' (length=11)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 11' (length=11)
/var/www/aws-s3-bulk-async-upload-poc.php:30:string 'Success: 8' (length=10)

Hope this will help you to solve your issue.