Can't access mounted Dataset on Azure Machine Learning Service Notebook

1.8k Views Asked by At

I am using the Notebook feature from the Azure Machine Learning Service. In this notebook I connected to the workspace, retrieved the corresponding datastore and retrieved my files as a file-dataset object. So far everything works.

from azureml.core import Workspace, Datastore, Dataset
import pandas as pd
import os

workspace = Workspace.from_config()
container="cnt_name"
file_path = 'actual_path'
# get datstore and dataset
datastore = Datastore.get(workspace, container)
datastore_path = [(datastore, file_path )]
dataset = Dataset.File.from_files(datastore_path)

Now I try to mount this file_dataset

mounted_path = "/tmp/test_dir4"
dataset_mounted = dataset.mount(mounted_path)

and everything seems fine. A quick ls gives the following output:

    ls -ltr /tmp/
    prwx------ 1 azureuser azureuser    0 May 12 13:29 clr-debug-pipe-14801-259046-out
    prwx------ 1 azureuser azureuser    0 May 12 13:29 clr-debug-pipe-14801-259046-in
    d--------- 0 root      root         0 May 12 13:29 test_dir4
    drwx------ 3 azureuser azureuser 4096 May 12 13:29 tmpjrb2tx8g
    -rw------- 1 azureuser azureuser  364 May 12 13:29 tmp5w_ikt6j
    drwx------ 2 azureuser azureuser 4096 May 12 13:29 pyright-14886-W3YT3PTdzoIO

But here is my problem: The mounted folder is mounted by the root user. I cannot access it - neither from the notebook nor from the shell. ls yields the typical errors path not found or permission denied.

2

There are 2 best solutions below

0
On

You are nearly there! The dataset.mount(mounted_path) is a bit disturbing, but it actually returns you a mount context, which you need to start afterwards for it to work like follows:

# mount dataset onto the mounted_path of a Linux-based compute
mount_context = dataset.mount(mounted_path)

mount_context.start()

Afterwards you can check with the following code that you indeed have access to the files:

import os
print(os.listdir(mounted_path))
0
On

I have found it best to avoid explicitly stating the mount location, like so:

dataset_mounted = dataset.mount()

This returns a mount context which has to be activated using the .start() and .stop() methods (either explicitly or implicitly using Python contexts).

While the mount context is active, you can use dataset_mounted.mount_point like you would a text string specifying a directory in file operations. For example, if your file dataset contains an image named spar.png, you could display it using the following code in a Jupyter notebook:

from IPython.display import Image, display

dataset_mounted = dataset.mount()
dataset_mounted.start()
test_image = Image(dataset_mounted.mount_point + '/spar.png')
display(test_image)
dataset_mounted.stop()

I would also encourage using the Python with context manager, to make the code cleaner and reduce the chance that the mount is not closed:

from IPython.display import Image, display

with dataset.mount() as mount_context:
    test_image = Image(mount_context.mount_point + '/spar.png')
    display(test_image)