How do I upgrade a library in Qubole's Jupyter Notebook, using PySpark?

258 Views Asked by At

Is there a way to do it right from a cell in the notebook? similar to pip install ... --upgrade I didn't know how to do what's instructed on https://docs.qubole.com/en/latest/faqs/general-questions/install-custom-python-libraries.html#pre-installed-python-libraries The current Python version is 3.5.3, and Pandas 0.20.1. I need to upgrade Pandas, and Matplotlib

2

There are 2 best solutions below

1
On BEST ANSWER

In Qubole are two ways to upgrade/install a package for the python environment. Currently there is no interface available inside notebook to install new packages.

New and Recommended Way (via Package Mangement) : User can enable Package Management functionality for an account and add new packages to a cluster via UI. There are lot of advantages of using package management over cluster versions in terms of performance and usability. Refer to https://docs.qubole.com/en/latest/user-guide/package-management/index.html for further details.

Old Way (via bootstrap) : User can configure a bootstrap which is basically a shell script executed on each node when the cluster starts and or upscales (more nodes are getting added to cluster). This can be configured via clusters UI and need a cluster start for every change. This is what is instructed in link you shared.

0
On

You cannot download/upgrade packages directly from the cell in the notebook. This is because your notebook is associated to a cluster. Now, to ensure that all the nodes of the cluster have the package installed, you must either use the package management (https://docs.qubole.com/en/latest/user-guide/package-management/package-management-environment.html) or the cluster's node bootstrap (https://docs.qubole.com/en/latest/user-guide/clusters/run-scripts-cluster.html#examples-node-scripts).

Do let me know if you have any further questions.