How to do simple processing job (batch) in Azure in the most efficient and cheap manner

264 Views Asked by At

I will be receiving around 300 .csv files (small files in KB, no cleansing required) every month, which will need processing (not major processing) to find interesting facts in a report (using Python code) and sending the output reports to a cloud storage and email to specific email address. The challenge is to design the architect in the most efficient and cost-effective way. I have a couple of options, but I would like some suggestions on how to do it using AZURE.

One option is to use Azure blob storage trigger and Azure function and then store the results in output zone of the blob storage. One catch is that Azure function can run up to 15 minutes I think.

The option looks cheaper, however since the nature of job is batch I am wondering if there is any other cheaper and efficient option.

1

There are 1 best solutions below

4
SiddheshDesai On

You can make use of Batch Service to perform your batch jobs in a Python based batch processing like below:-

I have referred this Python batch service quickstart:-

Cloned the same application, Created one Batch service and edited config.py with the values below:-

BATCH_ACCOUNT_NAME = 'siliconbatch1'  
BATCH_ACCOUNT_KEY = 'CiI4HtsUw9GZ5YEc1BnoPmjb8ttwuP6GArO0wvltE+uoimclEwZnU9LJMjqnWCR4Ja9cPRVDHG7X+ABalNG3Iw=='  
BATCH_ACCOUNT_URL = 'https://siliconbatch1.australiaeast.batch.azure.com'  
STORAGE_ACCOUNT_NAME = 'valleystrg129'
STORAGE_ACCOUNT_KEY = 'xxxxxxxDF+AStPI60+Q=='
STORAGE_ACCOUNT_DOMAIN = 'blob.core.windows.net' 

enter image description here

Output:-

enter image description here

Blobs got uploaded to storage account with the batch job successfully like below:-

enter image description here

Another alternative as you suggested is to use Azure Functions Blob Trigger with input and output binding set to blob, Where the Blob To Blob trigger will copy blob from one blob to another in same or different storage account. Refer below steps to implement the same scenario:-

My init.py:-

import logging

import azure.functions as func


def main(myblob: func.InputStream, outputBlob: func.Out[str]) -> None:
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes")
    
    clear_text = myblob.read().decode('utf-8')
    #logging.info("Clear text:%s'", clear_text)

    outputBlob.set(clear_text)

function.json with input and output binding set to 2 different blobs, You can set this according to your requirement:-

enter image description here

enter image description here

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "myblob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "testhubname-applease/{name}",
      "connection": "valleystrg129_STORAGE"
    },
    {
      "type": "blob",
      "direction": "out",
      "name": "outputBlob",
      "path": "testhubname-leases/{rand-guid}",
      "connection": "valleystrg129_STORAGE"
    }
  ]
}

Output:-

enter image description here