Running GPU Monitoring on GCP in a container optimized OS

244 Views Asked by At

Title has most of the question, but more context is below

Tried following the directions found here: https://cloud.google.com/compute/docs/gpus/monitor-gpus

I modified the code a bit, but haven't been able to get it working. Here's the abbreviated cloud config I've been running that should show the relevant parts:

- path: /etc/scripts/gpumonitor.sh
permissions: "0644"
owner: root
content: |
  #!/bin/bash
  echo "Starting script..."
  sudo mkdir -p /etc/google
  cd /etc/google
  sudo git clone https://github.com/GoogleCloudPlatform/compute-gpu-monitoring.git
  echo "Downloaded Script..."
  echo "Starting up monitoring service..."
  sudo systemctl daemon-reload
  sudo systemctl --no-reload --now enable /etc/google/compute-gpu-monitoring/linux/systemd/google_gpu_monitoring_agent.service
  echo "Finished Script..."
- path: /etc/systemd/system/install-monitoring-gpu.service
permissions: "0644"
owner: root
content: |
  [Unit]
  Description=Install GPU Monitoring
  Requires=install-gpu.service
  After=install-gpu.service

  [Service]
  User=root
  Type=oneshot
  RemainAfterExit=true
  ExecStart=/bin/bash /etc/scripts/gpumonitor.sh
  StandardOutput=journal+console
  StandardError=journal+console
runcmd:
    - systemctl start install-monitoring-gpu.service

Edit: Turned out it was best to build a docker container with the monitoring script in it and run the docker container in my config script by passing the GPU into the docker container like shown in the following link https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus

0

There are 0 best solutions below