Does AWS Athena partition projection support more than one `storage.location.template`?

538 Views Asked by At

AWS ControlTower managed CloutTrail created account-trail-logs which used /org id/AWSLogs/… log path in S3 bucket, until Landing Zone 3.0 update which replaced it with organization-trail logs whose new log path is /org id/AWSLogs/org id/….

https://docs.aws.amazon.com/controltower/latest/userguide/2022-all.html

This poses challenge with the AWS Athena partition projection. The original DDL is as follows:

CREATE EXTERNAL TABLE cloudtrail_logs_partition_projected( 
    eventVersion STRING,
    userIdentity STRUCT< 
        type: STRING, 
        principalId: STRING, 
        arn: STRING, 
        accountId: STRING, 
        invokedBy: STRING, 
        accessKeyId: STRING, 
        userName: STRING, 
        sessionContext: STRUCT< 
            attributes: STRUCT< mfaAuthenticated: STRING, creationDate: STRING>, 
            sessionIssuer: STRUCT< type: STRING, principalId: STRING, arn: STRING, accountId: STRING, userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING, 
    errorCode STRING, 
    errorMessage STRING, 
    requestParameters STRING, 
    responseElements STRING, 
    additionalEventData STRING, 
    requestId STRING, 
    eventId STRING, 
    readOnly STRING, 
    resources ARRAY<STRUCT< arn: STRING, accountId: STRING, type: STRING>>,
    eventType STRING, 
    apiVersion STRING, 
    recipientAccountId STRING, 
    serviceEventDetails STRING, 
    sharedEventID STRING, 
    vpcEndpointId STRING )
PARTITIONED BY ( `accountid` string, `region` string, `date_created` string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<s3-bucket>/<org-id>/AWSLogs/'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.accountid.type'='injected',
  'projection.region.type'='enum', 
  'projection.region.values'='eu-north-1,ap-south-1,eu-west-3,eu-west-2,eu-west-1,ap-northeast-3,ap-northeast-2,ap-northeast-1,sa-east-1,ca-central-1,ap-southeast-1,ap-southeast-2,eu-central-1,us-east-1,us-east-2,us-west-1,us-west-2', 
  'projection.date_created.format'='yyyy/MM/dd', 
  'projection.date_created.interval'='1', 
  'projection.date_created.interval.unit'='DAYS', 
  'projection.date_created.range'='2021/01/01,NOW', 
  'projection.date_created.type'='date', 
  'storage.location.template'='s3://<s3-bucket-name>/<org-id>/AWSLogs/${accountid}/CloudTrail/${region}/${date_created}')

Since the LOCATION and storage.location.template are different for the older and newer s3 objects (i.e., logs from CloudTrail), what is the best solution to query the cloudtrail logs, old and new? I prefer a single Athena Table for both older and newer logs, but I am not sure if multiple LOCATIONS are supported.

0

There are 0 best solutions below