My existing solution was reading base64 string and writing it as file into blob storage
# Initialize Azure Blob Service Client
connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxxx;AccountKey=xxxxxxx;EndpointSuffix=core.windows.net" # Replace with your connection string
container_name = "sandpit/Attachments" # Replace with your container name
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
def write_file_to_blob(data, filename):
blob_client = blob_service_client.get_blob_client(container=container_name, blob=filename)
blob_client.upload_blob(data, overwrite=True)
# UDF to decode base64
def decode_base64(base64_str):
return base64.b64decode(base64_str)
# Register UDF
decode_udf = udf(decode_base64, BinaryType())
and was calling above as
collected_data = df_with_decoded_data.collect()
# Write each file to blob storage
for row in collected_data:
write_file_to_blob(row['DecodedData'], row['FinalFileName'])
Now i wanted to move this to Onelake and what is the way to establish the connection to onelake files/folder and perform this task
What sort of credentials are there for Onelake to be passed as?
I managed to get it functioning because the user account running the notebook had complete access to the /Files directory. Since this was a one-time task, I didn't proceed to integrate it with a Service Principal or Managed Identity.
The following code worked well for me: