Background:
I have an Az blob storage account with many files in it, organized in folders like this: /upload/yyyy/mm/dd/*.csv
My function is a BlobTrigger, like so:
[FunctionName("ReadUploadedFile")]
public static async Task ReadUploadedFile([BlobTrigger("upload/{year}/{month}/{day}/{name}", Connection = "MyStorage")] Stream myBlob, string name, ILogger log)
The expectation is that when I post a new file to the storage account "MyStorage" points to, the function will trigger and I can read the file.
The Problem:
I ran the function in debug mode in Visual Studio (2022), and the trigger fired for files that were already in the storage account. I am not sure if this is an artifact of the debug environment (I'm using local.settings.json to point to storage account), or if this will happen again should I publish the fn to Azure. I mean... if I wanted to iterate the storage account, there are easier ways.
What I want:
I want the function to just sit there doing nothing unless I post a new blob to the storage account while it's running. Where am I going wrong?
What I tried:
I tried stopping the debug session and restarting. The second session appears to be picking up where the first one left off. That is, I'm still seeing triggers from old files, but not the ones that already triggered during the first run. I get the impression that eventually I can "drain" this backlog, but for a variety of reasons I don't want to reprocess all these files whenever I check the project out and start the debugger.
It seems like someone, somewhere is keeping track of which files I've processed...?
The Azure Function infrastructure has to keep track of which Blob files you've already processed. It does this by storing state in the
azure-webjobs-hostscontainer of your Function App Storage Account.You'll have a different storage account for your local debug (which is probably using Azurite) compared to your deployed Azure instance.
So:
Say you process a bunch of files in your Azure instance. It will keep track of that state in its storage account.
Then you start your local debug, which is looking at the same Blob store, but different state information, because it's a different storage account. It will re-process all those files.
If you stop and start debug again, it will pick up from where you left off. Because the state already knowns it's processed those files.
There's a good description here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?tabs=python-v2%2Cisolated-process%2Cnodejs-v4&pivots=programming-language-csharp#blob-receipts