I want Kubernetes to schedule pods on nodes respecting GPU memory capacity of each node. So each pod needs some amount, so would like Kubenetes to count this allocation and not exceed a node's capacity. Do not need or want real-time measurement, just count and don't over allocate.
Based on documentation, I add a custom resource
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node webserver2 cluster-autoscaler.kubernetes.io/resource.cuda_0=14000
Each pod gets an
resources:
requests:
cuda_0: 2100
limits:
cuda_0: 2100
When I do this, the pods do not get scheduled. Is there a step I am missing?
The full yaml is here:
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: StatefulSet
metadata:
name: transcribe-worker-statefulset-name
spec:
podManagementPolicy: Parallel
replicas: 20
selector:
matchLabels:
app: transcribe-worker-pod # has to match .spec.template.metadata.labels below
serviceName: transcribe-worker-service # needed for service to assign dns entries for each pod
template:
metadata:
labels:
app: transcribe-worker-pod # has to match .spec.selector.matchLabels above
spec:
containers:
- image: localhost:32000/transcribe_worker_health_monitor:2022-12-03-m
name: transcribe-worker-health-monitor
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: '/health-of-health-monitor'
port: 8080
initialDelaySeconds: 300
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 10
- image: localhost:32000/transcribe_worker:2023-07-18-b
name: transcribe-worker-container # container name inside of the pod
ports:
- containerPort: 55001
name: name-b
livenessProbe:
httpGet:
path: '/health-of-transcriber'
port: 8080
initialDelaySeconds: 300
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 10
env:
- name: DEVICE
value: "cuda:0" #"cuda:1"
resources:
requests:
cuda_0: 2100
limits:
cuda_0: 2100