I am trying to run a model (python script) in script mode
on AWS sagemaker . I try to use Tensorflow estimator to invoke script from notebook as shown below
from sagemaker.tensorflow import TensorFlow
tf_estimator = TensorFlow(
entry_point='train.py',
role=role,
train_instance_count=1,
train_instance_type='local_gpu',
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={'epochs': 10})
tf_estimator.fit({'training': training_path_input, 'validation': validation_path_input})
I get error as shown below.
> Creating tmpvq65nmup_algo-1-wipol_1 ...
> ting tmpvq65nmup_algo-1-wipol_1 ... error
> ERROR: for tmpvq65nmup_algo-1-wipol_1 Cannot start service algo-1-wipol: OCI runtime create failed: container_linux.go:349:
> starting container process caused "process_linux.go:449: container
> init caused \"process_linux.go:432: running prestart hook 1 caused
> \\\"error running hook: exit status 1, stdout: , stderr:
> nvidia-container-cli: initialization error: nvml error: driver not
> loaded\\\\n\\\"\"": unknown
I would like know how this can be fixed.
Hi could you provide more information regarding the notebook instance you have, which kernel you were running the notebook example with?
The issue seems to be that the the nvidia driver was not installed.