I have a problem in the HDInsight Jupyter Notebook.
I cannot access outside files. I am trying to access files on the HDInsight cluster head node which I can ssh to using my username and password from a remote terminal.
If I try to install the hdfs library, the notebook reports it is already installed.
% pip install hdfs # python code
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: hdfs in /home/spark/.local/lib/python3.8/site-packages (2.7.3)
Requirement already satisfied: docopt in /home/spark/.local/lib/python3.8/site-packages (from hdfs) (0.6.2)
... more statements like this follow ...
Note: you may need to restart the kernel to use updated packages.
But when I try to import from the hdfs library, it reports
from hdfs import InsecureClient # python code
from pyspark.sql import SparkSession # python code
An error was encountered:
No module named 'hdfs'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'hdfs'
What is the problem?
Is there a different way to access the files on the HDInsight cluster head node?