Add a Persistent Volume Claim to a Kubernetes Dask Cluster

333 Views Asked by At

I am running a Dask cluster and a Jupyter notebook server on cloud resources using Kubernetes and Helm

I am using a yaml file for the Dask cluster and Jupyter, initially taken from https://docs.dask.org/en/latest/setup/kubernetes-helm.html:

apiVersion: v1
kind: Pod
worker:
  replicas: 2 #number of workers
  resources:
    limits:
      cpu: 2
      memory: 2G
    requests:
      cpu: 2
      memory: 2G
  env:
    - name: EXTRA_PIP_PACKAGES
      value: s3fs --upgrade
# We want to keep the same packages on the workers and jupyter environments
jupyter:
  enabled: true
  env:
    - name: EXTRA_PIP_PACKAGES
      value: s3fs --upgrade
  resources:
    limits:
      cpu: 1
      memory: 2G
    requests:
      cpu: 1
      memory: 2G

an I am using another yaml file to create the storage locally.

#CREATE A PERSISTENT VOLUME CLAIM // attached to our pod config
apiVersion: 1
kind: PersistentVolumeClaim
metadata:
 name: dask-cluster-persistent-volume-claim
spec:
 accessModes:
  - ReadWriteOne #can be used by a single node -ReadOnlyMany : for multiple nodes -ReadWriteMany: read/written to/by many nodes
 ressources:
  requests:
   storage: 2Gi # storage capacity

I would like to add a persistent volume claim to the first yaml file, I couldn't figure out where the add volumes and volumeMounts. if you have an idea, please share it, thank you

1

There are 1 best solutions below

0
On BEST ANSWER

I started by creating a pvc claim with the YAML file:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pdask-cluster-persistent-volume-claim
spec:
  accessModes:
    - ReadWriteOnce #can be used by a single node -ReadOnlyMany : for multiple nodes -ReadWriteMany: read/written to/by many nodes
  resources: # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
    requests:
      storage: 2Gi

with lunching in bash:

kubectl apply -f Dask-Persistent-Volume-Claim.yaml
#persistentvolumeclaim/pdask-cluster-persistent-volume-claim created

I checked the creation of persitent volume:

kubectl get pv

I made major changes to the Dask cluster YAML: I added the volumes and volumeMounts where I read/write from a directory /data from the persistent volume created previously, I specified ServiceType to LoadBalancer with port:

apiVersion: v1
kind: Pod
scheduler:
  name: scheduler 
  enabled: true
  image:
    repository: "daskdev/dask"
    tag: 2021.8.1
    pullPolicy: IfNotPresent
  replicas: 1  #(should always be 1).
  serviceType: "LoadBalancer" # Scheduler service type. Set to `LoadBalancer` to expose outside of your cluster.
  # serviceType: "NodePort"
  # serviceType: "ClusterIP"
  #loadBalancerIP: null  # Some cloud providers allow you to specify the loadBalancerIP when using the `LoadBalancer` service type. If your cloud does not support it this option will be ignored.
  servicePort: 8786 # Scheduler service internal port.
# DASK WORKERS
worker:
  name: worker  # Dask worker name.
  image:
    repository: "daskdev/dask"  # Container image repository.
    tag: 2021.8.1  # Container image tag.
    pullPolicy: IfNotPresent  # Container image pull policy.
    dask_worker: "dask-worker"  # Dask worker command. E.g `dask-cuda-worker` for GPU worker.
  replicas: 2
  resources:
    limits:
      cpu: 2
      memory: 2G
    requests:
      cpu: 2
      memory: 2G
  mounts: # Worker Pod volumes and volume mounts, mounts.volumes follows kuberentes api v1 Volumes spec. mounts.volumeMounts follows kubernetesapi v1 VolumeMount spec
    volumes:
      - name: dask-storage
        persistentVolumeClaim:
         claimName: pvc-dask-data
    volumeMounts:
      - name: dask-storage
        mountPath: /save_data # folder for storage
  env:
    - name: EXTRA_PIP_PACKAGES
      value: s3fs --upgrade
# We want to keep the same packages on the worker and jupyter environments
jupyter:
  name: jupyter  # Jupyter name.
  enabled: true  # Enable/disable the bundled Jupyter notebook.
  #rbac: true  # Create RBAC service account and role to allow Jupyter pod to scale worker pods and access logs.
  image:
    repository: "daskdev/dask-notebook"  # Container image repository.
    tag: 2021.8.1  # Container image tag.
    pullPolicy: IfNotPresent  # Container image pull policy.
  replicas: 1  # Number of notebook servers.
  serviceType: "LoadBalancer" # Scheduler service type. Set to `LoadBalancer` to expose outside of your cluster.
  # serviceType: "NodePort"
  # serviceType: "ClusterIP"
  servicePort: 80  # Jupyter service internal port.
  # This hash corresponds to the password 'dask'
  #password: 'sha1:aae8550c0a44:9507d45e087d5ee481a5ce9f4f16f37a0867318c' # Password hash.
  env:
    - name: EXTRA_PIP_PACKAGES
      value: s3fs --upgrade
  resources:
    limits:
      cpu: 1
      memory: 2G
    requests:
      cpu: 1
      memory: 2G
  mounts: # Worker Pod volumes and volume mounts, mounts.volumes follows kuberentes api v1 Volumes spec. mounts.volumeMounts follows kubernetesapi v1 VolumeMount spec
    volumes:
      - name: dask-storage
        persistentVolumeClaim:
         claimName: pvc-dask-data
    volumeMounts:
      - name: dask-storage
        mountPath: /save_data # folder for storage

Then, I install my Daskconfiguration using helm:

helm install my-config dask/dask -f values.yaml

Finally, I accessed my jupyter interactively:

kubectl exec -ti [pod-name] -- /bin/bash

to examine the existence of the /data folder