I am facing below error , while loading data from azure blob to snowflake:

ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'Reports.csv' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0x6167724F,Source=ICSharpCode.SharpZipLib,'

Though files are not encrypted and not password protected. Could you please help to resolve this issue. Thanks in advance.

1

There are 1 best solutions below

0
On
ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'Reports.csv' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0x6167724F,Source=ICSharpCode.SharpZipLib,'

The error indicates that your file is still encrypted and protected. At this moment, ADF does not support encrypted or protected Excel files. So, you can follow the workaround below:

Read the password-protected Excel file using Python in Azure Databricks with the following code:

import io
import msoffcrypto
import openpyxl
from azure.storage.blob import BlobServiceClient
from azure.storage.blob import BlobClient

connection_string = "<blobStorageConnnectionString>"
# Create a BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# Get the blob client for your Excel file
blob_client = blob_service_client.get_blob_client(container="<containerName>", blob="<xlsxFilePath>")
# Download the blob contents into a stream
stream = io.BytesIO()
blob_client.download_blob().readinto(stream)
# Decrypt the workbook
decrypted_workbook = io.BytesIO()
office_file = msoffcrypto.OfficeFile(stream)
office_file.load_key(password='<password>')
office_file.decrypt(decrypted_workbook)
# Load the workbook using openpyxl
workbook = openpyxl.load_workbook(filename=decrypted_workbook)
for sheet_name in workbook.sheetnames:
    sheet = workbook[sheet_name]
    print(f"Sheet: {sheet_name}")
    for row in sheet.iter_rows(values_only=True):
        print(row)
    print()

It will read the Excel sheets and files as shown below:

enter image description here

You can use the following code to convert those sheets into CSV format and upload them to Blob storage:

import pandas as pd
from azure.storage.blob import BlobClient

# Load the workbook using pandas
xls = pd.ExcelFile(decrypted_workbook)

# Iterate over each sheet and convert it to CSV
for sheet_name in xls.sheet_names:
    df = pd.read_excel(xls, sheet_name)
    csv_data = df.to_csv(index=False)
    
    # Upload the CSV data to the same blob location with a different name
    csv_blob_name = f"outputs/{sheet_name}.csv"
    csv_blob_client = blob_service_client.get_blob_client(container="files", blob=csv_blob_name)
    csv_blob_client.upload_blob(csv_data, overwrite=True)
    
    print(f"CSV file for sheet '{sheet_name}' uploaded to Azure Blob Storage.")

print("All sheets converted to CSV and uploaded successfully.")

All sheets will be copied in CSV format to Blob storage as shown below:

enter image description here

enter image description here

Use those CSV files as a source and copy the data into the Snowflake. For more information, you can refer to this MS Question.