I'm looking for some advice on the best / most cost effective solutions to use for my use case on Google Cloud (described below).

Currently, I'm using Cloud Composer, and it's way too expensive. It seems like this is the result of composer always running, so I'm looking for something that either isn't constantly running or is much cheaper to run / can accomplish the same thing.

Use Case / Process >> I have a process setup that follows the below steps:

  1. There is a site built with Firebase that has a file drop / upload (CSV) functionality to import data into Google Storage
  2. That file drop triggers a cloud function that starts the Cloud Composer DAG
  3. The DAG moves the CSV from Cloud Storage to BigQuery while also performing a bunch of modifications to the dataset using Python / SQL queries.

Any advice on what would potentially be a better solution?

It seems like Dataflow might be an option, but pretty new and wanted a second opinion.

Appreciate the help!

2

There are 2 best solutions below

0
On

If your file is not so big, you can process it with python and pandas data frame, in my experience it works very well with files around 1,000,000 rows

then with the bigquery API you can upload directly the dataframe transformed into bigquery, all in your cloud function, remember that cloud functions can process data until 9 minutes, the best, this way is costless.

0
On

Was looking into it recently myself. I'm pretty sure Dataflow can be used for this case, but I doubt it will be cheaper (also considering time you will spend learning and migrating to Dataflow if you are not an expert already).

Depending on the complexity of transformations you do on the file, you can look into data integration solutions such as https://fivetran.com/, https://www.stitchdata.com/, https://hevodata.com/ etc. They are mainly build to just transfer your data from one place to another, but most of them are also able to perform some transformations on the data. If I'm not mistaken in Fivetran it's sql based and in Hevo it's python.

There's also this article that goes into scaling up and down Composer nodes https://medium.com/traveloka-engineering/enabling-autoscaling-in-google-cloud-composer-ac84d3ddd60 . Maybe it will help you to save some cost. I didn't notice any significant cost reduction to be honest, but maybe it works for you.