AWS S3 Connection in druid

3.4k Views Asked by At

I have set up a clustered Druid with the configuration as mentioned in the Druid documentation

https://druid.apache.org/docs/latest/tutorials/cluster.html

I am using AWS S3 for deep storage. Following is the snippet of my common configuration file

druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage", "druid-s3-extensions", "druid-orc-extensions", "druid-lookups-cached-global"]
# For S3:
druid.storage.type=s3
druid.storage.bucket=bucket-name
druid.storage.baseKey=druid/segments
#druid.storage.disableAcl=true
druid.storage.sse.type=s3
#druid.s3.accessKey=...
#druid.s3.secretKey=...

# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=bucket-name
druid.indexer.logs.s3Prefix=druid/stage/indexing-logs

While running any ingestion task I am getting Access denied error

Java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ; S3 Extended Request ID: ), S3 Extended Request ID: 
    at org.apache.druid.storage.s3.S3DataSegmentPusher.push(S3DataSegmentPusher.java:103) ~[?:?]
    at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$4(AppenderatorImpl.java:791) ~[druid-server-0.19.0.jar:0.19.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.19.0.jar:0.19.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.19.0.jar:0.19.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.19.0.jar:0.19.0]

I am using s3 for two purposes

  1. read data from s3 and ingest it. This connection is working fine and data is being from s3 location
  2. for deep storage. I am getting error over here.

I am using Profile information authentication method to provide s3 credential. So I already have configured aws cli with appropriate credentials. Also, s3 data is encrypted by AES256 so i have added druid.storage.sse.type=s3 in config file.

Can someone help me out here as I am not able to debug the issue.

1

There are 1 best solutions below

0
On

You asked how to approach debugging this. Normally I would:

  1. Ssh onto the ec2 instance and run aws sts get-caller-identity. This will tell you what principal your requests are sent from. Then, I would confirm that principal has the S3 access that is expected.
  2. I would confirm that I can write to the bucket in your configuration.
druid.storage.type=s3
druid.storage.bucket=<bucket-name>
druid.storage.baseKey=druid/segments
  1. I would try some of the other auth methods such as exporting the keys into the environment mentioned in the third option since that is a simple test. Then I would run step 1 again to confirm my principal reflects those keys. And then I would try running your code again. enter image description here