Remove Azure blob storage contents that are untouch for period of time

2.2k Views Asked by At

The application I developed basically allows users to upload contents and get stored in Azure Blob Storage.

Since the nature of the contents are for quick sharing between users, many of the contents are quickly become untouched after a period of time. But for some contents can be used over and over again.

In order to stop the unprecedented growth of the size of blob storage, I am planning to write tool that basically find any blobs that aren't used for period of time and delete them off the storage.

If it was standard file system, I can use "Last Access Time" to indicate when the last time file being used. However, I can't seem to find similar property of the blob to determine Last Access Time.

So does anyone ever come across this situation, what would be the best way to achieve this? Or am I too concerned about this?

Any feedback or suggestion are greatly appreciated.

Thank you in advanced.

5

There are 5 best solutions below

0
On

This is available now trough Lifecyle Managment in France Central, Canada East, and Canada Central as the feature is in preview.

enter image description here

More details here

3
On

You can use block and page blob Properties.LastModifiedUtc to get the last modified date. With Page or Block blob, you would need to use GetBlockBlobReference or GetPageBlobReference API along with FetchAttributes() to get the blob reference and then you can look for LastModifiedUtc.

For example with Block blob here is the code snippet:

CloudBlockBlob blockBlob = container_name.GetBlockBlobReference(uri.ToString());
blockBlob.FetchAttributes();
// blockBlob.Properties.LastModifiedUtc will return the last modified date for the blob.
0
On

I can only see two ways of handling this:

  1. Front the access to the blob such that they must hit a service to get the blob URL with SAS signature. This way you can count and monitor which blobs are getting accessed. Remove older blobs that have low/no access after some time. This requires turning off public access so people cannot just go around your SAS signature.
  2. Turn on storage analytics and monitor the GET requests. You would have to parse all the GET accesses for a month for example ($logs are updated hourly) and group by resource. If you automated this, it would not be too terrible. This would give you a list of all the resources that had been accessed.
0
On

If you are using Blob storage then following the approach that Gaurav suggested is your best option. See here for a doc on getting started:

https://azure.microsoft.com/en-us/documentation/articles/storage-analytics/.

Note that our .NET client libraries do include support for parsing log files - you can see a demo of this in our client library unit tests:

https://github.com/Azure/azure-storage-net/search?utf8=%E2%9C%93&q=ListLogs

1
On

This is much more straightforward now with Azure Blob Storage support for lifecycle management.

Edit: As pointed out, Blob storage lifecycle management only allows for setting up rules based on last modification date and not last accessed date.

Manage the Azure Blob storage lifecycle

Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage accounts. Use the policy to transition your data to the appropriate access tiers or expire at the end of the data's lifecycle.

The lifecycle management policy lets you:

  • Transition blobs to a cooler storage tier (hot to cool, hot to archive, or cool to archive) to optimize for performance and cost
  • Delete blobs at the end of their lifecycles
  • Define rules to be run once per day at the storage account level Apply rules to containers or a subset of blobs (using prefixes as filters)

Azure Lifecycle Management Rule