Not sure how to pass .env file to Dataproc Job Submit call

33 Views Asked by At

I created a python library that in certain places must reach out to an API call to fetch some data. The way I handled credentials is by creating a .env file at the root of our repo and we used dotenv.load_dotenv(dotenv.find_dotenv()) to read that information.

I am planning on running this code through a Dataproc cluster that's already created by submitting the job like so:

gcloud dataproc jobs submit pyspark gs://tabsearch/main.py \
 --cluster=tabsearch-cluster \
 --region=us-central1 \
 --py-files=gs://tabsearch_poc/src.zip
 --properties-file=gs://tabsearch_poc/.env

Where gs://tabsearch/main.py is the main file, and gs://tabsearch_poc/src.zip are its supporting files, and gs://tabsearch_poc/.env the file with credentials. When I execute the call above, I get the following error:

Traceback (most recent call last):
  File "/tmp/dskajdnaksjdw28834e2/main.py", line 7, in <module>
    inventory_toolkit = toolkit()
  File "src.zip/src/inventory_tool/toolkit.py", line 14, in toolkit
  File "/opt/conda/default/lib/python3.10/site-packages/dotenv/main.py", line 300, in find_dotenv
    for dirname in _walk_to_root(path):
  File "/opt/conda/default/lib/python3.10/site-packages/dotenv/main.py", line 257, in _walk_to_root
    raise IOError('Starting path not found')

How is this typically done? It's worth mentioning that we're trying to avoid passing credentials in the gcloud command because eventually it will be handed over to other services to call the system im building.

0

There are 0 best solutions below