Rook ceph broken on kubernetes?

1k Views Asked by Thomas Decaux At 07 September 2020 at 19:16

Using Ceph v1.14.10, Rook v1.3.8 on k8s 1.16 on-premise. After 10 days without any trouble, we decided to drain some nodes, then, all moved pods cant attach to their PV any more, look like Ceph cluster is broken:

My ConfigMap rook-ceph-mon-endpoints is referencing 2 missing mon pod IPs:

csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["10.115.0.129:6789","10.115.0.4:6789","10.115.0.132:6789"]}]

But

kubectl -n rook-ceph get pod -l app=rook-ceph-mon -o wide

NAME                               READY   STATUS    RESTARTS   AGE     IP             NODE                    NOMINATED NODE   READINESS GATES
rook-ceph-mon-e-56b849775-4g5wg    1/1     Running   0          6h42m   10.115.0.2     XXXX   <none>           <none>
rook-ceph-mon-h-fc486fb5c-8mvng    1/1     Running   0          6h42m   10.115.0.134   XXXX   <none>           <none>
rook-ceph-mon-i-65666fcff4-4ft49   1/1     Running   0          30h     10.115.0.132   XXXX   <none>           <none>

Is it normal or I must run a kind of "reconciliation" task to update the CM with new mon pod IPs ?

(could be related to https://github.com/rook/rook/issues/2262)

I had to manualy update:

secret rook-ceph-config
cm rook-ceph-mon-endpoints
cm rook-ceph-csi-config

Original Q&A

There are 1 best solutions below

Thomas Decaux On 10 September 2020 at 15:48

As @travisn said:

The operator owns updating that configmap and secret. It's not expected to update them manually unless there is some disaster recovery situation as described at https://rook.github.io/docs/rook/v1.4/ceph-disaster-recovery.html.

Rook ceph broken on kubernetes?

There are 1 best solutions below

Related Questions in KUBERNETES

Related Questions in CEPH

Related Questions in ROOK-STORAGE

Trending Questions

Popular # Hahtags

Popular Questions