thanos-store pod in AWS EKS keeps crashing with S3 "Access Denied" log

410 Views Asked by At

I am attempting to get a kube-thanos (https://github.com/thanos-io/kube-thanos) implementaiton working in an AWS EKS cluster.

I am implementing a "remote write" setup, with S3 integration, thanos-receive and thanos-store, with no sidecar for Prometheus.

Everything seems to come up fine but the thanos-store pod keeps crashing with err="bucket store initial sync: sync block: BaseFetcher: iter bucket: Access Denied" log messages.

I am attempting using the AWS IRSA method to enable thanos-store pod to access S3.

I have a "thanos" role with the required S3 permissions and the role is properly annotated on the thanos-store service account.

The --objstore.config=$(OBJSTORE_CONFIG) points to a Kubernetes secret that is formulated from this YML:

type: S3
config:
  bucket: gd9-thanos
  endpoint: s3.us-east-2.amazonaws.com

When the thanos-store pod comes up (before it crashes) it looks like it has all the environment variables needed to make the IRSA work:

   - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-east-2
    - name: AWS_REGION
      value: us-east-2
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxxxxxxx:role/thanos
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

I have tried a few of the suggestions here but nothing seems to work.

Can anyone suggest how to further troubleshoot?

 k get pods -n monitoring | grep thanos
thanos-query-75f5bbf7c-62528                                1/1     Running            0             4h10m
thanos-receive-ingestor-default-0                           1/1     Running            0             4h10m
thanos-receive-router-76576bf5cb-ld6jh                      1/1     Running            0             4h10m
thanos-store-0                                              0/1     CrashLoopBackOff   8 (44s ago)   21m

Thanks for any suggestions!

1

There are 1 best solutions below

1
Dmytro Sirant On

The problem you have might be related to the IMDSv2 enabled on your worker nodes. See here https://github.com/thanos-io/thanos/issues/3143