Flink Kubernetes deployment - the HPA controller was unable to get a selector

52 Views Asked by At

I am deploying the Flink stateful app using the below-mentioned YAML file.

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: simple-flink
spec:
  image: flink-1.17-python-iceberg:1.17
  flinkVersion: v1_16
  ingress:
    template: "{{name}}.{{namespace}}.flink.k8s.io"
    className: "nginx"
    annotations:
      nginx.ingress.kubernetes.io/proxy-body-size: 50m
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "1"
    state.savepoints.dir: file:///flink-data/savepoints
    state.checkpoints.dir: file:///flink-data/checkpoints
    high-availability.type: kubernetes
    high-availability.storageDir: file:///flink-data/ha
    rest.client-max-content-length: "1004857600"
  serviceAccount: flink
  jobManager:
    replicas: 1
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    replicas: 1
    resource:
      memory: "2048m"
      cpu: 1
  podTemplate:
    spec:
      containers:
        - name: flink-main-container
          volumeMounts:
          - mountPath: /flink-data
            name: flink-volume
          env:
          - name: HADOOP_CONF_DIR
            value: "/opt/hadoop-2.8.5/etc/hadoop:/opt/hadoop-2.8.5/share/hadoop/common/lib/*:/opt/hadoop-2.8.5/share/hadoop/common/*:/opt/hadoop-2.8.5/share/hadoop/hdfs:/opt/hadoop-2.8.5/share/hadoop/hdfs/lib/*:/opt/hadoop-2.8.5/share/hadoop/hdfs/*:/opt/hadoop-2.8.5/share/hadoop/yarn/lib/*:/opt/hadoop-2.8.5/share/hadoop/yarn/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/*:/opt/hadoop-2.8.5/contrib/capacity-scheduler/*.jar"
      volumes:
      - name: flink-volume
        hostPath:
          path: /tmp
          type: Directory`

Flink Jobs are running perfectly. For auto-scaling I created HPA using the following code.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: simple-flink
  namespace: default
spec:
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageValue: 10Mi
  scaleTargetRef:
    apiVersion: flink.apache.org/v1beta1
    kind: FlinkDeployment
    name: simple-flink

While describing the auto scaling I am getting below mentioned error.

Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedComputeMetricsReplicas  6m23s (x12 over 9m8s)  horizontal-pod-autoscaler  selector is required
  Warning  SelectorRequired              4m8s (x21 over 9m8s)   horizontal-pod-autoscaler  selector is required`

And when doing a "kubectl describe hpa simple-flink" I get the following status info:

status:
  conditions:
  - lastTransitionTime: "2023-12-19T13:42:00Z"
    message: the HPA controller was able to get the target's current scale
    reason: SucceededGetScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2023-12-19T13:42:00Z"
    message: the HPA target's scale is missing a selector
    reason: InvalidSelector
    status: "False"
    type: ScalingActive`

I've tried as suggested in this other thread: https://stackoverflow.com/questions/73075996/flink-kubernetes-deployment-the-hpa-controller-was-unable-to-get-the-targets to execute the following to update the CRD to the last version:

git clone https://github.com/apache/flink-kubernetes-operator
cd flink-kubernetes-operator
kubectl replace -f helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml

After that I recreated the deployment and the HPA but I get the same error.

Thanks a lot for any suggestion on how to fix this problem

0

There are 0 best solutions below