I am using pushgateway
to exposes metrics coming from short-lived batch jobs.
At the moment the pushgateway instance is launched on a baremetal machine, where I have a docker volume mounted to allow survival of metrics in case of a container restart (in conjunction with the --persistence.file
parameter).
Here an extract of the docker-compose.yml
file used to run the container:
pushgateway:
image: prom/pushgateway:v1.2.0
restart: unless-stopped
volumes:
- pushgw-data:/data
ports:
- "${PUSHGW_PORT:-9091}:9091"
command: --persistence.file="/data/metric.store"
I am moving to a (private) kubernetes cluster without persistent volumes, but equipped with an s3-compatible object storage.
From this issue on github it seems possible to target s3 for the checkpointing, but without further input I am not sure how to achieve this, and that's the best I could find by searching the Web for information.
Can anyone point me in the right direction?
So finally https://serverfault.com/questions/976764/kubernetes-run-aws-s3-sync-rsync-against-persistent-volume-on-demand pointed me in the right direction.
This is an extract of the
deployment.yaml
descriptor which works as expected:Note the override of entrypoint for the docker image of the pushgateway. In my case I have put 10 seconds delay to start, you might need to tune the delay to suits your needs. This delay is needed because the pushgateway container will boot faster than the sidecar (also due to the network exchange with s3, I suppose).
If the pushgateway starts when not metric store file is already present, it won't be used/considered. But it gets worse, when you first send data to the pushgateway, it will override the file. At that point, the "sync" from the sidecar container will also override the original "copy", so please pay attention and be sure you have a backup of the metrics file before experimenting with this delay value.