Deleting files in Azure Synapse Notebook

1.4k Views Asked by At

This should have been simple but turned out to require a bit of GoogleFu. I have an Azure Synapse Spark Notebook written in C# that

  • Receives a list of Deflate compressed IIS files.
  • Reads the files as binary into a DataFrame
  • Decompresses these files one at a time and writes them into Parquet format.

Now after all of them have been successfully processed I need to delete the compressed files.

2

There are 2 best solutions below

0
On

This is my proof of concept but it works perfectly.

#r "nuget:Azure.Storage.Files.DataLake,12.0.0-preview.9"

using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Utils;
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Notebook.MSSparkUtils;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;

string blob_sas_token = Credentials.GetConnectionStringOrCreds('your linked service name here');

Uri uri = new Uri($"https://'your storage account name here'.blob.core.windows.net/'your container name here'{blob_sas_token}") ;
DataLakeServiceClient _serviceClient = new DataLakeServiceClient(uri);
DataLakeFileSystemClient fileClient = _serviceClient.GetFileSystemClient("'path to directory containing the file here'") ;
fileClient.DeleteFile("'file name here'") ;

The call to Credentials.GetConnectionStringOrCreds returns a signed SAS token that is ready for your code to attach to a storage resource uri.

You could of course use the DeleteFileAsync method if you so desire.

Hope this saves someone else a few hours of GoogleFu.

0
On

Use the below command in notebook for Pyspark users

mssparkutils.fs.rm(, True)