I created a python library that in certain places must reach out to an API call to fetch some data. The way I handled credentials is by creating a .env file at the root of our repo and we used dotenv.load_dotenv(dotenv.find_dotenv()) to read that information.
I am planning on running this code through a Dataproc cluster that's already created by submitting the job like so:
gcloud dataproc jobs submit pyspark gs://tabsearch/main.py \
--cluster=tabsearch-cluster \
--region=us-central1 \
--py-files=gs://tabsearch_poc/src.zip
--properties-file=gs://tabsearch_poc/.env
Where gs://tabsearch/main.py is the main file, and gs://tabsearch_poc/src.zip are its supporting files, and gs://tabsearch_poc/.env the file with credentials. When I execute the call above, I get the following error:
Traceback (most recent call last):
File "/tmp/dskajdnaksjdw28834e2/main.py", line 7, in <module>
inventory_toolkit = toolkit()
File "src.zip/src/inventory_tool/toolkit.py", line 14, in toolkit
File "/opt/conda/default/lib/python3.10/site-packages/dotenv/main.py", line 300, in find_dotenv
for dirname in _walk_to_root(path):
File "/opt/conda/default/lib/python3.10/site-packages/dotenv/main.py", line 257, in _walk_to_root
raise IOError('Starting path not found')
How is this typically done? It's worth mentioning that we're trying to avoid passing credentials in the gcloud command because eventually it will be handed over to other services to call the system im building.