Azure Arc on Google Cloud GKE Cluster

198 Views Asked by At

I want to set up Azure Arc on a Google Cloud GKE Autopilot cluster so I can manage its K8 resources in Azure. I am just setting up my first GKE cluster and my first Azure Arc Connection too. I am following the quick start here (https://learn.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#prerequisites). I have an active GKE cluster. There is an azure command that establishes the link AND deploys resources via Helm to my GKE cluster (which is set as the default kubectl context).

The job sent to my GKE cluster always fails.. this is the describe for the job that is set on my cluster... (I grabbed it while it was running)...

Name:             cluster-diagnostic-checks-job
Namespace:        azure-arc-release
Selector:         controller-uid=1285d828-698e-4e7d-b03d-ac819e793024
Labels:           app=cluster-diagnostic-checks
                  app.kubernetes.io/managed-by=Helm
Annotations:      autopilot.gke.io/resource-adjustment:
                    {"input":{"containers":[{"name":"cluster-diagnostic-checks-container"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storag...
                  batch.kubernetes.io/job-tracking: 
                  meta.helm.sh/release-name: cluster-diagnostic-checks
                  meta.helm.sh/release-namespace: azure-arc-release
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Tue, 16 May 2023 10:17:09 -0700
Pods Statuses:    1 Active (0 Ready) / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=cluster-diagnostic-checks
                    controller-uid=1285d828-698e-4e7d-b03d-ac819e793024
                    job-name=cluster-diagnostic-checks-job
  Service Account:  cluster-diagnostic-checkssa
  Containers:
   cluster-diagnostic-checks-container:
    Image:      mcr.microsoft.com/azurearck8s/clusterdiagnosticchecks:v0.1.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      /cluster_diagnostic_checks_job_script.sh
    Args:
      None
      None
      None
      eastus
      AZUREPUBLICCLOUD
    Limits:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Environment:          <none>
    Mounts:               <none>
  Volumes:                <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  10s   job-controller  Created pod: cluster-diagnostic-checks-job-dkql8

Here is the describe for the pod...

Name:             cluster-diagnostic-checks-job-dkql8
Namespace:        azure-arc-release
Priority:         0
Service Account:  cluster-diagnostic-checkssa
Node:             <none>
Labels:           app=cluster-diagnostic-checks
                  controller-uid=1285d828-698e-4e7d-b03d-ac819e793024
                  job-name=cluster-diagnostic-checks-job
Annotations:      <none>
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:               
IPs:              <none>
Controlled By:    Job/cluster-diagnostic-checks-job
Containers:
  cluster-diagnostic-checks-container:
    Image:      mcr.microsoft.com/azurearck8s/clusterdiagnosticchecks:v0.1.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      /cluster_diagnostic_checks_job_script.sh
    Args:
      None
      None
      None
      eastus
      AZUREPUBLICCLOUD
    Limits:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5gxkd (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-5gxkd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 kubernetes.io/arch=amd64:NoSchedule
                             kubernetes.io/arch=arm64:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From                                   Message
  ----     ------            ----  ----                                   -------
  Warning  FailedScheduling  16s   gke.io/optimize-utilization-scheduler  0/3 nodes are available: 1 node(s) had untolerated taint {ToBeDeletedByClusterAutoscaler: 1684257394}, 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod.
  Normal   TriggeredScaleUp  11s   cluster-autoscaler                     pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/subscripify/zones/us-central1-a/instanceGroups/gk3-autopilot-cluster-1-pool-1-3cb7bde1-grp 0->1 (max: 1000)}]

Unfortunately, the container does not produce any logs whatsoever.

I don't think this is a resource problem, I am looking at the resource quota limits on Google Cloud here(https://console.cloud.google.com/iam-admin/quotas?project=my-project) and they seem adequate - but I am a little less experienced with Google Cloud than I am Azure. Is there anyone out there that has tried this (specifically Azure Arc connected to GKE autopilot cluster) and been successful? If so - can you offer a little nudge in the right direction?

0

There are 0 best solutions below