I am working with Informatica Data Integrator and trying to set up a connection with a Databricks cluster. So far everything seems to work fine, but one issue is that under Spark configuration we had to put the SAS key for the ADLS gen 2 storage account.
The reason for this is that when Informatica tries to write to Databricks it first has to write that data into a folder in ADLS gen 2 and then Databricks essentially takes that file and writes it as a Delta Lake table.
Now one issue is that the field where we put the Spark config contains the full SAS value (url plus token and password). That is not really a good thing unless we only make 1 person an admin.
Did anyone work with Informatica and Databricks? Is it possible to put the Spark config as a file and then have the Informatica connector read that file? Or is it possible to add that SAS key to the Spark cluster (the interactive cluster we use) and have that cluster read the info from that file?
Thank you for any help with this.
You really don't need to put SAS key value into Spark configuration, but instead you need to store that value in the Azure KeyVault-baked secret scope (on Azure) or Databricks secret scope (in other clouds), and then refer to that value from Spark configuration using the syntax
{{secrets/<secret-scope-name>/<secret-key>}}
(see doc) - in this case, SAS key value will be read on the cluster start, and won't available to the users who have access to a cluster UI.