Synapse packages disappear from Spark pool

66 Views Asked by csukcl At 26 January 2024 at 08:21

For the past few days we've been waking up to errors with our pipelines and it's because the Python packages the notebooks depend on are no longer added to the Spark pool. How can we ensure the packages persist?

We haven't worked out a pattern of why they disappear yet. Sometimes, only one or two packages disappear, sometimes it's all of them. Sometimes it seems like it's when an Azure DevOps deployment has been done. It's very inconsistent.

Using pip install in the notebook is not a possibility as we have some custom wheels that are not publicaly available.

Does anyone know what could be the cause and how we could resolve this?

Original Q&A

There are 1 best solutions below

DileeprajnarayanThumula On 29 January 2024 at 10:15

A requirements.txt file functions as a configuration file that you can upload to the Spark cluster. Upon cluster initialization, it performs the equivalent of a "pip install" for all the packages specified in the file. To include additional packages, simply update the requirements.txt file and restart the cluster or apply the changes as needed.

To upload the file to your cluster, go to the "Manage" section, select "Spark Pools," and click the three dots corresponding to the Spark cluster where you wish to incorporate the package. upload your requirements file and confirm the changes by clicking "apply."

enter image description here

You can now use your new libraries as needed.

In the above requirements.txt I have used python library called splink.

I have also confirmed the installation of libraries and verified that Splink has been successfully installed.

enter image description here

Synapse packages disappear from Spark pool

There are 1 best solutions below

Related Questions in AZURE

Related Questions in AZURE-SYNAPSE-ANALYTICS

Related Questions in AZURE-SYNAPSE-PIPELINE

Trending Questions

Popular # Hahtags

Popular Questions