I am running an OKD 4.5 cluster with 3 master nodes on AWS, installed using openshift-install. In attempting to update the cluster to 4.5.0-0.okd-2020-09-04-180756 I have run into numerous issues.
The current issue is the console and apiserver pods on one master server are in crashLoopBackoff do to what appears to be an internal networking issue.
The logs of the apiserver pod are as follows:
Copying system trust bundle I0911 15:59:15.763716 1 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key" F0911 15:59:19.556715 1 cmd.go:72] unable to load configmap based request-header-client-ca-file: > Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: connect: no route to host
I have tried deleting the pods, and the new ones crashLoop as well.
update removed the troubled mater, added a new machine to build a new master. apiserver and console are no longer failing, but now etcd is.
#### attempt 9
member={name="ip-172-99-6-251.ec2.internal", peerURLs=[https://172.99.6.251:2380}, clientURLs=[https://172.99.6.251:2379]
member={name="ip-172-99-6-200.ec2.internal", peerURLs=[https://172.99.6.200:2380}, clientURLs=[https://172.99.6.200:2379]
member={name="ip-172-99-6-249.ec2.internal", peerURLs=[https://172.99.6.249:2380}, clientURLs=[https://172.99.6.249:2379]
target=nil, err=<nil>
#### sleeping...
*note 172.99.6.251 is the ip of the master node this one replaced