Checkpoint pushgateway persistence file to object store

1.1k Views Asked by At

I am using pushgateway to exposes metrics coming from short-lived batch jobs.

At the moment the pushgateway instance is launched on a baremetal machine, where I have a docker volume mounted to allow survival of metrics in case of a container restart (in conjunction with the --persistence.file parameter).

Here an extract of the docker-compose.yml file used to run the container:

 pushgateway:
    image: prom/pushgateway:v1.2.0
    restart: unless-stopped
    volumes:
      - pushgw-data:/data
    ports:
      - "${PUSHGW_PORT:-9091}:9091"
    command: --persistence.file="/data/metric.store"

I am moving to a (private) kubernetes cluster without persistent volumes, but equipped with an s3-compatible object storage.

From this issue on github it seems possible to target s3 for the checkpointing, but without further input I am not sure how to achieve this, and that's the best I could find by searching the Web for information.

Can anyone point me in the right direction?

1

There are 1 best solutions below

0
On

So finally https://serverfault.com/questions/976764/kubernetes-run-aws-s3-sync-rsync-against-persistent-volume-on-demand pointed me in the right direction.

This is an extract of the deployment.yaml descriptor which works as expected:

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: {{K8S_NAMESPACE}}
  name: {{K8S_DEPLOYMENT_NAME}}
spec:
  selector:
    matchLabels:
      name: {{K8S_DEPLOYMENT_NAME}}
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        name: {{K8S_DEPLOYMENT_NAME}}
        version: v1
    spec:
      containers:
      - name: {{AWSCLI_NAME}}
        image: {{IMAGE_AWSCLI}}
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: {{SECRET_NAME}}
              key: accesskey
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: {{SECRET_NAME}}
              key: secretkey
        command: [ "/bin/bash",
                   "-c",
                   "aws --endpoint-url {{ENDPOINT_URL}} s3 sync s3://{{BUCKET}} /data; while true; do aws --endpoint-url {{ENDPOINT_URL}} s3 sync /data s3://{{BUCKET}}; sleep 60; done" ]
        volumeMounts:
          - name: pushgw-data
            mountPath: /data
      - name: {{PUSHGATEWAY_NAME}}
        image: {{IMAGE_PUSHGATEWAY}}
        command: [ '/bin/sh', '-c' ]
        args: [ 'sleep 10; /bin/pushgateway --persistence.file=/data/metric.store' ]
        ports:
        - containerPort: 9091
        volumeMounts:
        - name: pushgw-data
          mountPath: /data
      volumes:
        - name: pushgw-data
          emptyDir: {}
        - name: config-volume
          configMap:
            name: {{K8S_DEPLOYMENT_NAME}}
      imagePullSecrets:
            - name: harbor-bot
      restartPolicy: Always

Note the override of entrypoint for the docker image of the pushgateway. In my case I have put 10 seconds delay to start, you might need to tune the delay to suits your needs. This delay is needed because the pushgateway container will boot faster than the sidecar (also due to the network exchange with s3, I suppose).

If the pushgateway starts when not metric store file is already present, it won't be used/considered. But it gets worse, when you first send data to the pushgateway, it will override the file. At that point, the "sync" from the sidecar container will also override the original "copy", so please pay attention and be sure you have a backup of the metrics file before experimenting with this delay value.