I am looking for how to have a spare/cold replica/pod in my Kubernetes configuration. I assume it would go in my Kuberentes deployment or HPA configuration. Any idea how I would make it so I have 2 spare/cold instances of my app always ready, but only get put into the active pods once HPA requests another instance? My goal is to have basically zero startup time on a new pod when HPA says it needs another instance.
apiVersion: apps/v1
kind: Deployment
metadata:
name: someName
namespace: someNamespace
labels:
app: someName
version: "someVersion"
spec:
replicas: $REPLICAS
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: someMaxSurge
maxUnavailable: someMaxUnavailable
selector:
matchLabels:
app: someName
version: someVersion
template:
metadata:
labels:
app: someName
version: "someVersion"
spec:
containers:
- name: someName
image: someImage:someVersion
imagePullPolicy: Always
resources:
limits:
memory: someMemory
cpu: someCPU
requests:
memory: someMemory
cpu: someCPU
readinessProbe:
failureThreshold: someFailureThreshold
initialDelaySeconds: someInitialDelaySeconds
periodSeconds: somePeriodSeconds
timeoutSeconds: someTimeoutSeconds
livenessProbe:
httpGet:
path: somePath
port: somePort
failureThreshold: someFailureThreshold
initialDelaySeconds: someInitialDelay
periodSeconds: somePeriodSeconds
timeoutSeconds: someTimeoutSeocnds
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: someName
namespace: someNamespace
spec:
minAvailable: someMinAvailable
selector:
matchLabels:
app: someName
version: "someVersion"
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: someName-hpa
namespace: someNamespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: someName
minReplicas: someMinReplicas
maxReplicas: someMaxReplicas
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: someAverageUtilization
It is a good practice to have at least two replicas for services on Kubernetes. This helps if e.g. a node goes down or you need to do maintenance of the node. Also set Pod Topology Spread Constraints so that those pods are scheduled to run on different nodes.
Set the number of replicas that you minimum want as desired state. In Kubernetes, traffic will be load balanced to the replicas. Also use Horizontal Pod Autoscaler to define when you want to autoscale to more replicas. You can set the requirements low for autoscaling, if you want to scale up early.