I was wondering as to what the best practices are around mounting and unmounting in Databricks using dbfs.
More Details: We are using Azure Data Lake Storage. We have multiple notebooks and in each of the notebooks we have code that calls mount, processes files, and then unmounts at the end (using code similar to https://forums.databricks.com/questions/8103/graceful-dbutils-mountunmount.html). But, it looks like mount points are shared by all notebooks. So I was wondering as to what would happen if 2 notebooks began running around the same time, and one was running faster than the other. Could we get into a situation where the first notebook could end up unmounting the dbfs, while the 2nd notebook is still in the midst of its processing?
So should one be mounting within a notebook, or should this be done in some sort of initialization routine that all notebooks should call? Similarly, should one try and unmount within a notebook, or should we just not bother with unmounts?
Are there any best practices I should be following?
Note: I am a newbie to Databricks and Python.
Mounting is usually done once per storage account/container/... It makes no sense to repeat it again & again, and re-mounting when somebody works with data may lead to data losses - we have seen that in the practice. So it's better to mount everything at once, when creating workspace, or when adding the new storage account/container, and don't remount it. From automation standpoint I would recommend to use corresponding resources in the Databricks Terraform provider.
But the main problem with mounts is that anyone in workspace can use it, and data access will happen under the permissions of those who mounted it (for example, service principal) - because of this, mounts are very bad from security point of view (until we use so-called credential passthrough mounts available on Azure)