I have a Job Object which shall use a Node Selector to only use Nodes, which have a GPU under the hood. I know to to set it (it gets converted from a string in a python program).
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
sma-gpu-size: {gpu_size}
"""
Our ops team sets this selectors in the next few weeks, but currently when setting the node selector, the service is not able to start.
2022-09-20T07:20:24Z [Warning] 0/35 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 30 node(s) didn't match Pod's node affinity/selector.
Is it somehow possible to use these node_selectors only if they are available, something like this (pseudo yaml)?
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
if_available:
sma-gpu-size: {gpu_size}
else:
Any
"""
It's not, but you can replace the
nodeSelector
with anodeAffinity
to achieve that.From docs:
After the label has been added, you can switch to
requiredDuringSchedulingIgnoredDuringExecution
:or back to
nodeSelector
.