How to recover Kubernetes cluster created on AWS using KOPS?

355 Views Asked by Samith Perera At 26 September 2022 at 10:07

We were trying to upgrade the Kops version of the Kubernetes Cluster. We have followed the below steps for that;

Download the latest KOPS version 1.24 (the old version is 1.20)
Do the template changes according to 1.24
Set ENV variables

export KUBECONFIG="<<Kubeconfig file>>"
export AWS_PROFILE="<< AWS PROFILE NAME >>"
export AWS_DEFAULT_REGION="<< AWS Region >>"
export KOPS_STATE_STORE="<< AWS S3 Bucket Name >>"
export NAME="<< KOPS Cluster Name >>"

kops get $NAME -o yaml > existing-cluster.yaml
kops toolbox template --template templates/tm-eck-mixed-instances.yaml --values values_files/values-us-east-1.yaml --snippets snippets --output cluster.yaml --name $NAME
kops replace -f cluster.yaml
kops update cluster --name $NAME
kops rolling-update cluster --name $NAME --instance-group=master-us-east-1a --yes --cloudonly

Once the master is rolled over I noticed that this master is not joined to the cluster. After a few rounds of troubleshooting, I found the below error in the API server.

I0926 09:54:41.220817 1 flags.go:59] FLAG: --vmodule="" I0926 09:54:41.223834 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/srv/kubernetes/kube-controller-manager/server.crt::/srv/kubernetes/kube-controller-manager/server.key" unable to load configmap based request-header-client-ca-file: Get "https://127.0.0.1/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 127.0.0.1:443: connect: connection refused

I have tried to resolve this issue and couldn't find a way, SO decided to roll back using a backup. These are the steps I've followed for that;

kops replace -f cluster.yaml
kops update cluster --name $NAME
kops rolling-update cluster --name $NAME --instance-group=master-us-east-1a --yes --cloudonly

Still, I'm getting the same error in the Master node.

Does anyone know how I can restore the cluster using Kops ??

Original Q&A

There are 1 best solutions below

Samith Perera On 26 September 2022 at 10:07 BEST ANSWER

After a few rounds of troubleshooting, I've found that whenever we deploy a new version using kops it's creating a new version in the launch template in AWS. I have manually changed the launch template version used in the Auto scaling group of all node groups. Then cluster is rollbacked to the previous state and starts working properly. Then I reran the upgrade process after adding the missing configurations into the kops template file.

How to recover Kubernetes cluster created on AWS using KOPS?

There are 1 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in KUBERNETES

Related Questions in KOPS

Trending Questions

Popular # Hahtags

Popular Questions