Lookup Key from S3 Bucket using Boto

43 Views Asked by At

I've taken a script written by Paul Davies about reingesting Splunk Logs from the AWS Cloud.

The

When my logs have failed to process in Kinesis Firehose they get placed in a backup S3 bucket. The Current format of the key is the following:

Folder/Folder/Year/Month/Day/HH/failedlogs

Example:

splunk-kinesis-firehose/splunk-failed/2023/01/01/01/failedlogs.gz

The key lookup in the script is set like this

key=urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')

Is there way to get all the files within the following S3 Bucket under the sub folder - splunk-kinesis-firehose or is there a better way of looping through all folders?

2

There are 2 best solutions below

0
Pierre D On

As John Rotenshtein says, your Lambda function, if invoked by S3 trigger, will receive the key as part of the request. You could also invoke the Lambda manually and pass the key in the request.

But if, for some reason, you want to do a full (or partial) listing under a path, then please take a look at s3list() that I describe in this SO post. It is a fairly general S3 lister. In your case, you would call it with:

bucket = boto3.resource('s3').Bucket('bucket-name')
path = 'splunk-kinesis-firehose/splunk-failed'

for s3obj in s3list(bucket, path, list_dirs=False):
    key = s3obj.key
    ...

to get all the objects under that path, or, for example:

for s3obj in s3list(bucket, path, start='2023/05/01', end='2023/06', list_dirs=False):
    key = s3obj.key
    ...

to get just the files for the month of May 2023.

Note that s3list is a generator: you can start listing a trillion objects and stop whenever you like (internally, it goes by chunks of upo to 1000 objects per call to AWS).

0
John Rotenstein On

To list objects in an Amazon S3 bucket you can use the client method: list_objects_v2 - Boto3 documentation:

import boto3

s3_client = boto3.client('s3')

response = s3_client.list_objects_v2(
    Bucket='your-bucket-name',
    Prefix='splunk-kinesis-firehose',
)

Or you can use the resource method, which is a bit more Pythonic:

import boto3

s3_resource = boto3.resource('s3')

bucket = s3_resource.Bucket('your-bucket-name')

for obj in bucket.objects.all():
    print(obj.key)