Why do my grafana tempo ingester pods go into Backoff restarting state after max_block_duration?

534 Views Asked by At

I am using grafana-tempo distributed helm chart. It is successfully deployed and its backend is configured on Azure Storage (blob containers) and working fine.

I have a demo application which is sending traces to grafana-tempo. I can confirm I'm receiving traces.

The issue I have observed is that exactly after 30m, my ingester pods are going into Back-off restarting state. And I have to manually restart its statefulset.

While searching the root cause, found that their is one parameter max_block_duration which has a default value of 30m: "max_block_duration: maximum length of time before cutting a block."

So I tried to increase the timing, and given value 60m. Now after 60 minutes my ingester pods are going into Back-off restarting state.

I have also enabled autoscaling. But no new pods are coming up if all ingester pods are in the same error state.

Can someone help me out to understand why its happening like this and the possible solution to eleminate the issue?

What value should be passed to max_block_duration so that this pods will not so in Back-off restarting?

I expect my Ingester pods should run fine every time.

1

There are 1 best solutions below

1
On BEST ANSWER

I also opened a github issue on tempo. And now this issue no more exist at my end. If someone is also facing same, you can have a look into my github issue to get some more insights : https://github.com/grafana/tempo/issues/2488