I want to transfer large datasets from Amazon s3 to Azure Blob storage. Can anyone help me with how can I modify my code to handle large datasets? Below is my code in java:
try {
storageAccount = new CloudStorageAccount(new StorageCredentialsAccountAndKey(azureCredentialsDto.getStorageAccountName(), azureCredentialsDto.getStorageAccountKey()), true);
blobClient = storageAccount.createCloudBlobClient();
container = blobClient.getContainerReference(azureCredentialsDto.getBlobContainerName());
log.info("Creating Container: "+container.getName());
container.createIfNotExists(BlobContainerPublicAccessType.CONTAINER, new BlobRequestOptions(), new OperationContext());
CloudBlockBlob blob = container.getBlockBlobReference(destinationFileName);
URI blockbloburl = blob.getUri();
log.info("Blob URI: " + blockbloburl);
// sourcefileurl is url of amazons3 file I want to copy
blob.startCopy(new URI(sourceFileUrl));
log.info("Copy Started...");
If you just need to transfer the files with large size the best option is to use
Copy activity
in Azure Data Factory (ADF).AzCopy and ADF are the two best approach when we need to move large size files.
To use AzCopy refer Move your data from AWS S3 to Azure Storage using AzCopy
To accomplish it using ADF refer the below links:
https://www.youtube.com/watch?v=9uXDt0DP9cs&ab_channel=TechBrothersIT
Azure Data Factory V2 Pipelines for Copying Large AWS S3 Buckets to Azure Storage