Get-AzStorageBlob API Hangs often for a large blob container

121 Views Asked by At

I am using Get-AzStorageBlob to read the blobs in a Azure Container Blob Storage. It works fine when I have around 200k blobs but now my blob is over 3 Million. Then I was not able to read all using a single Get-AzStorageBlob.

I changed the code as you see below to read chunks of 5000 blobs (I also tried smaller amounts like 1000), but it still hangs .. usually after reading 500k blobs. I tried to include the timeout on -ServerTimeoutPerRequest and ClientTimeoutPerRequest side settings as 10 seconds, but eventually, the Get-AzStorageBlob will hang, and no exception is thrown after these 10 seconds.

I honestly do not know what to do anymore. I never had such issues dealing with AWS S3, and I am surprised the amount of work to make this API work properly which I did not succeed yet

''' function ScanBlobBatch {

$token = $Null
$maxReturn = 5000

# Define the number of maximum retries
$maxRetries = 3

# Define the delay between retries (in seconds)
$retryDelay = 5

# Define the breath delay (in seconds)
$breathDelay = 1

do {

    # Scanning
    $retryCount = 0
    do {
        try {

            $blobs += Get-AzStorageBlob -Container $sourceContainer -Context $ctxSource -MaxCount $maxReturn -ContinuationToken $token -ServerTimeoutPerRequest 10 -ClientTimeoutPerRequest 10
            Write-Host "total read so far:" $blobs.Count 
            break
        }
        catch {
           
            #increase the retries
            $retryCount++
            if ($retryCount -lt $maxRetries) {
                Start-Sleep -Seconds $retryDelay
            }
            else {
                exit 1
            }
        } 
    } while ($retryCount -lt $maxRetries)
    # end of current scan batch
  
    if ($blobs.Length -le 0) { break; }
    $token = $blobs[$blobs.Count - 1].ContinuationToken;
    Start-Sleep -Seconds $breathDelay #time to breath!
} while ($null -ne $token)

return $blobs

}'''

1

There are 1 best solutions below

0
On

I could advice you to use parallel processing

ForEach-Object -Parallel

For example

# Define the number of parallel threads to use
$threadCount = 10

# Run the parallel processing script
$results = $blobs | ForEach-Object -ThrottleLimit $threadCount -Parallel $readscript

Where
$blobs: list of blobs in the container
$readscript: the script that reads the blobs

For more information, please read through this article