Title has most of the question, but more context is below
Tried following the directions found here: https://cloud.google.com/compute/docs/gpus/monitor-gpus
I modified the code a bit, but haven't been able to get it working. Here's the abbreviated cloud config I've been running that should show the relevant parts:
- path: /etc/scripts/gpumonitor.sh
permissions: "0644"
owner: root
content: |
#!/bin/bash
echo "Starting script..."
sudo mkdir -p /etc/google
cd /etc/google
sudo git clone https://github.com/GoogleCloudPlatform/compute-gpu-monitoring.git
echo "Downloaded Script..."
echo "Starting up monitoring service..."
sudo systemctl daemon-reload
sudo systemctl --no-reload --now enable /etc/google/compute-gpu-monitoring/linux/systemd/google_gpu_monitoring_agent.service
echo "Finished Script..."
- path: /etc/systemd/system/install-monitoring-gpu.service
permissions: "0644"
owner: root
content: |
[Unit]
Description=Install GPU Monitoring
Requires=install-gpu.service
After=install-gpu.service
[Service]
User=root
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/bash /etc/scripts/gpumonitor.sh
StandardOutput=journal+console
StandardError=journal+console
runcmd:
- systemctl start install-monitoring-gpu.service
Edit: Turned out it was best to build a docker container with the monitoring script in it and run the docker container in my config script by passing the GPU into the docker container like shown in the following link https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus