I am using Get-AzStorageBlob to read the blobs in a Azure Container Blob Storage. It works fine when I have around 200k blobs but now my blob is over 3 Million. Then I was not able to read all using a single Get-AzStorageBlob.
I changed the code as you see below to read chunks of 5000 blobs (I also tried smaller amounts like 1000), but it still hangs .. usually after reading 500k blobs. I tried to include the timeout on -ServerTimeoutPerRequest and ClientTimeoutPerRequest side settings as 10 seconds, but eventually, the Get-AzStorageBlob will hang, and no exception is thrown after these 10 seconds.
I honestly do not know what to do anymore. I never had such issues dealing with AWS S3, and I am surprised the amount of work to make this API work properly which I did not succeed yet
''' function ScanBlobBatch {
$token = $Null
$maxReturn = 5000
# Define the number of maximum retries
$maxRetries = 3
# Define the delay between retries (in seconds)
$retryDelay = 5
# Define the breath delay (in seconds)
$breathDelay = 1
do {
# Scanning
$retryCount = 0
do {
try {
$blobs += Get-AzStorageBlob -Container $sourceContainer -Context $ctxSource -MaxCount $maxReturn -ContinuationToken $token -ServerTimeoutPerRequest 10 -ClientTimeoutPerRequest 10
Write-Host "total read so far:" $blobs.Count
break
}
catch {
#increase the retries
$retryCount++
if ($retryCount -lt $maxRetries) {
Start-Sleep -Seconds $retryDelay
}
else {
exit 1
}
}
} while ($retryCount -lt $maxRetries)
# end of current scan batch
if ($blobs.Length -le 0) { break; }
$token = $blobs[$blobs.Count - 1].ContinuationToken;
Start-Sleep -Seconds $breathDelay #time to breath!
} while ($null -ne $token)
return $blobs
}'''
I could advice you to use parallel processing
For example
For more information, please read through this article