I'm currently using the GCE standard container cluster with lot of success and pleasure. But I had a question about the provisioning of GCE Persistent disks.
As described in this document form Kubernetes. I created two YAML files:
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
and
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
If I now create a following Volume Claim:
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "claim-test",
"annotations": {
"volume.beta.kubernetes.io/storage-class": "hdd"
}
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "3Gi"
}
}
}
}
The disk gets created perfectly! And if I now start following unit
apiVersion: v1
kind: ReplicationController
metadata:
name: nfs-server
spec:
replicas: 1
selector:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: claim-test
The disk gets mounted perfectly but many times I stumble upon the following error (not more can be found in the kubelet.log file):
Failed to attach volume "claim-test" on node "...." with: GCE persistent disk not found: diskName="....." zone="europe-west1-b" Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "....". list of unattached/unmounted volumes=[....]
Sometimes the pod boots perfectly, but sometimes it crashes. The only thing I could find is that there needs to be enough time between creating the PVC and the RC itself. I tried this many times but with the same uncertain results.
I hope someone can give me some kind of suggestion or help.
Thanks in advance! Best regards,
Hacor
Thanks in advance for your comments! After a few days of searching I was finally able to determine what the problem was, I'm posting it because it may be useful other users.
I was using the NFS example for Kubernetes as a replication controller to provide my apps with NFS storage, but it seems that when the NFS server and the PV,PVC get deleted sometimes the NFS share gets stuck on the node itself and I think it has to do with the fact that I didn't delete this elements in a particular order and therefore the node got stuck with the share becoming incapable of mounting new shares to itself or the pod!
I noticed that the problem always occurred after I deleted some app (NFS, PV, PVC and other components) from the cluster. If I created a new cluster on GCE it works perfectly to create apps, until I delete one and it goes wrong...
What the correct deletion order is I don't know for sure, but I think:
If the pod takes longer to delete, and it isn't completely gone before PV is deleted, the node hangs with a mount it can't delete because it's in use, and that's where the problems occur.
I must honestly say that now I'm moving to an externally provisioned GlusterFS cluster. Hope it helps someone!
Regards,
Hacor