Nodes with different capacity GPU in microk8s kubernets, custom resource

49 Views Asked by At

I want Kubernetes to schedule pods on nodes respecting GPU memory capacity of each node. So each pod needs some amount, so would like Kubenetes to count this allocation and not exceed a node's capacity. Do not need or want real-time measurement, just count and don't over allocate.

Based on documentation, I add a custom resource

k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node webserver2 cluster-autoscaler.kubernetes.io/resource.cuda_0=14000

Each pod gets an

resources:

    requests:
       cuda_0: 2100
    limits:
       cuda_0: 2100

When I do this, the pods do not get scheduled. Is there a step I am missing?

The full yaml is here:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: StatefulSet
metadata:
  name: transcribe-worker-statefulset-name
spec:
  podManagementPolicy: Parallel
  replicas: 20
  selector:
    matchLabels:
      app: transcribe-worker-pod # has to match .spec.template.metadata.labels below
  serviceName: transcribe-worker-service # needed for service to assign dns entries for each pod
  template:
    metadata:
      labels:
        app: transcribe-worker-pod # has to match .spec.selector.matchLabels above
    spec:
      containers:
        - image: localhost:32000/transcribe_worker_health_monitor:2022-12-03-m
          name: transcribe-worker-health-monitor
          ports:
            - containerPort: 8080
          livenessProbe:
                httpGet:
                  path: '/health-of-health-monitor'
                  port: 8080
                initialDelaySeconds: 300
                periodSeconds: 15
                failureThreshold: 3
                timeoutSeconds: 10
    - image: localhost:32000/transcribe_worker:2023-07-18-b
      name: transcribe-worker-container # container name inside of the pod
      ports:
        - containerPort: 55001
          name:  name-b
      livenessProbe:
            httpGet:
              path: '/health-of-transcriber'
              port: 8080
            initialDelaySeconds: 300
            periodSeconds: 15
            failureThreshold: 3
            timeoutSeconds: 10

      env:
        - name: DEVICE
          value: "cuda:0" #"cuda:1" 

      resources: 
        requests:
           cuda_0: 2100
        limits:
           cuda_0: 2100
0

There are 0 best solutions below