Rook Ceph Provisioning issue

4.1k Views Asked by At

I am having an issue when trying to create my PVC. It appears as though the provisioner is unable to create space.

k describe pvc avl-vam-pvc-media-ceph
Name:          avl-vam-pvc-media-ceph
Namespace:     default
StorageClass:  rook-ceph-block
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type     Reason                Age                From                                                                                                        Message
  ----     ------                ----               ----                                                                                                        -------
  Normal   ExternalProvisioning  10s (x5 over 67s)  persistentvolume-controller                                                                                 waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          5s (x8 over 67s)   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-sv4gz_73756eff-f42e-4d8f-8448-d5dedd94d1f2  External provisioner is provisioning volume for claim "default/avl-vam-pvc-media-ceph"
  Warning  ProvisioningFailed    5s (x8 over 67s)   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-sv4gz_73756eff-f42e-4d8f-8448-d5dedd94d1f2  failed to provision volume with StorageClass "rook-ceph-block": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes

Below is my PVC yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: avl-vam-pvc-media-ceph
spec:
  storageClassName: "rook-ceph-block"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

I used the ./rook/cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml to create my storageclass. I am confused where this is going wrong.

One other thing I find odd in my ceph cluster is my pgs appear to be stuck at undersized

ceph health detail
HEALTH_WARN Degraded data redundancy: 33 pgs undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 33 pgs undersized
    pg 1.0 is stuck undersized for 51m, current state active+undersized, last acting [1,0]
    pg 2.0 is stuck undersized for 44m, current state active+undersized, last acting [3,0]
    pg 2.1 is stuck undersized for 44m, current state active+undersized, last acting [2,5]
    pg 2.2 is stuck undersized for 44m, current state active+undersized, last acting [5,4]
    pg 2.3 is stuck undersized for 44m, current state active+undersized, last acting [5,4]
    pg 2.4 is stuck undersized for 44m, current state active+undersized, last acting [2,1]
    pg 2.5 is stuck undersized for 44m, current state active+undersized, last acting [3,4]
    pg 2.6 is stuck undersized for 44m, current state active+undersized, last acting [2,3]
    pg 2.7 is stuck undersized for 44m, current state active+undersized, last acting [3,2]
    pg 2.8 is stuck undersized for 44m, current state active+undersized, last acting [3,0]
    pg 2.9 is stuck undersized for 44m, current state active+undersized, last acting [4,1]
    pg 2.a is stuck undersized for 44m, current state active+undersized, last acting [2,3]
    pg 2.b is stuck undersized for 44m, current state active+undersized, last acting [3,4]
    pg 2.c is stuck undersized for 44m, current state active+undersized, last acting [2,3]
    pg 2.d is stuck undersized for 44m, current state active+undersized, last acting [0,1]
    pg 2.e is stuck undersized for 44m, current state active+undersized, last acting [2,3]
    pg 2.f is stuck undersized for 44m, current state active+undersized, last acting [1,0]
    pg 2.10 is stuck undersized for 44m, current state active+undersized, last acting [2,1]
    pg 2.11 is stuck undersized for 44m, current state active+undersized, last acting [3,4]
    pg 2.12 is stuck undersized for 44m, current state active+undersized, last acting [3,2]
    pg 2.13 is stuck undersized for 44m, current state active+undersized, last acting [0,5]
    pg 2.14 is stuck undersized for 44m, current state active+undersized, last acting [3,4]
    pg 2.15 is stuck undersized for 44m, current state active+undersized, last acting [4,3]
    pg 2.16 is stuck undersized for 44m, current state active+undersized, last acting [5,2]
    pg 2.17 is stuck undersized for 44m, current state active+undersized, last acting [5,2]
    pg 2.18 is stuck undersized for 44m, current state active+undersized, last acting [5,2]
    pg 2.19 is stuck undersized for 44m, current state active+undersized, last acting [0,3]
    pg 2.1a is stuck undersized for 44m, current state active+undersized, last acting [3,2]
    pg 2.1b is stuck undersized for 44m, current state active+undersized, last acting [2,5]
    pg 2.1c is stuck undersized for 44m, current state active+undersized, last acting [5,4]
    pg 2.1d is stuck undersized for 44m, current state active+undersized, last acting [3,0]
    pg 2.1e is stuck undersized for 44m, current state active+undersized, last acting [2,5]
    pg 2.1f is stuck undersized for 44m, current state active+undersized, last acting [4,3]

I do have OSDs up

 ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME                  STATUS  REWEIGHT  PRI-AFF
-1         10.47958  root default
-3          5.23979      host hostname1
 0    ssd   1.74660          osd.0                  up   1.00000  1.00000
 2    ssd   1.74660          osd.2                  up   1.00000  1.00000
 4    ssd   1.74660          osd.4                  up   1.00000  1.00000
-5          5.23979      host hostname2
 1    ssd   1.74660          osd.1                  up   1.00000  1.00000
 3    ssd   1.74660          osd.3                  up   1.00000  1.00000
 5    ssd   1.74660          osd.5                  up   1.00000  1.00000
1

There are 1 best solutions below

1
On

You should set accessModes to ReadWriteOnce when using rbd. ReadWriteMany is supported by cephfs. Also because your replica is 3 and the failure domain (which ceph decide to replicate each copy of data) is by host you should add 3 nodes or more to solve the stuck pgs.