I'm looking for some advice on the best / most cost effective solutions to use for my use case on Google Cloud (described below).
Currently, I'm using Cloud Composer, and it's way too expensive. It seems like this is the result of composer always running, so I'm looking for something that either isn't constantly running or is much cheaper to run / can accomplish the same thing.
Use Case / Process >> I have a process setup that follows the below steps:
- There is a site built with Firebase that has a file drop / upload (CSV) functionality to import data into Google Storage
- That file drop triggers a cloud function that starts the Cloud Composer DAG
- The DAG moves the CSV from Cloud Storage to BigQuery while also performing a bunch of modifications to the dataset using Python / SQL queries.
Any advice on what would potentially be a better solution?
It seems like Dataflow might be an option, but pretty new and wanted a second opinion.
Appreciate the help!
If your file is not so big, you can process it with python and pandas data frame, in my experience it works very well with files around 1,000,000 rows
then with the bigquery API you can upload directly the dataframe transformed into bigquery, all in your cloud function, remember that cloud functions can process data until 9 minutes, the best, this way is costless.