Azure DSVM: Cannot connect to the Docker daemon

301 Views Asked by At

We have been using Data Science Virtual Machine in combination with Virtual Machine scale set for our CI and then running custom Docker image in connected Azure pipelines.

https://github.com/PyTorchLightning/metrics/blob/77e252ec6165ec94e23ce5c5cf9ffdad01bf54a1/azure-pipelines.yml#L29

Recently we are observing the following failer message

Starting: Initialize containers
/usr/bin/docker version --format '{{.Server.APIVersion}}'
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
'
##[error]Exit code 1 returned from process: file name '/usr/bin/docker', arguments 'version --format '{{.Server.APIVersion}}''.

see the full output here - https://dev.azure.com/PytorchLightning/Metrics/_build/results?buildId=9061&view=logs&j=fd70b5b8-241a-53bf-d137-3fd86cf9f066&t=a0ca1fe4-fde6-4a82-9888-52f5ae79d8fe

UPDATE: the issue was solved in June 2021 release, see Azure DSVM release notes

3

There are 3 best solutions below

1
On

Below command is working on the latest Data Science Virtual Machine.

/usr/bin/docker --version

Docker version 20.10.6+azure, build 370c28948e3c12dce3d1df60b6f184990618553f

However above command output works, we need to start docker daemon using the below commands:

sudo systemctl unmask docker

sudo systemctl start docker

sudo chmod 777 /var/run/docker.sock

0
On

Based on the discussion on the post above, the solution (for now) is to pin the version of the scale set image to a previous version:

az vmss update -g <resource group> -n <vmss name> --set virtualMachineProfile.storageProfile.imageReference.version=21.01.21

Docker appears to be disabled in the latest version of the DSVM. Until that is corrected, pin the version. In general, for stability, pinning the version is probably a good idea and then be deliberate about when you change versions so that you know what is going on.

0
On

The docker is enabled by default on the latest image release (21.06.01) of Data Science Virtual Machine - Ubuntu 18. This should probably resolve this issue.