aws wrangler (pandas layer). problem with path to S3 bucket

435 Views Asked by bob At 30 April 2023 at 19:54

here is my python code in my lambda layer. Shout out to John R, for some of this paginator code. from api gateway, I pass in path param (bucket) and query string params (fmt & date), such as:

https://3snk9o61.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=json&date=today

This code is probably overly convoluted but it works. My problem is on this line: raw_df = wr.s3.read_csv(path1,path2, use_threads=True) The commented line above that is the original and works fine, but I dont want to parse the whole bucket contents. I want the dataframe to be limited to just the specific objects that are defined in the "object_list". The error that I get "no files found on s3://br-candles/br4.csv" implies that its not seeing multiple files. It is just finding the first file but its supposed to parse a list of files. Probably a very simple fix but I would appreciate any advice. Thanks

import json
import base64
import awswrangler as wr
import boto3

def lambda_handler(event, context):
    
    s3 = boto3.client('s3')
    object_list = []
    bucket_name = event['pathParameters']['bucket']
    
    format = event['queryStringParameters']['fmt']
    day = event['queryStringParameters']['date']
    print(day)
    paginator = s3.get_paginator("list_objects_v2")
    page_iterator = paginator.paginate(Bucket=bucket_name)
    for result in page_iterator:
      object_list += filter(lambda obj: obj['Key'].endswith('.csv'), result['Contents'])
    object_list.sort(key=lambda x: x['LastModified'])
    
    A = (object_list[-1]['Key'])
    B = (object_list[-4]['Key'])
    full_path = f"s3://{bucket_name}"
    path1 = f"s3://{bucket_name}/{A}"
    path2 = f"s3://{bucket_name}/{B}"
    #raw_df = wr.s3.read_csv(path=full_path, path_suffix=['.csv'], use_threads=True)
    raw_df = wr.s3.read_csv(path1,path2, use_threads=True)
    
    for df in raw_df:
      if day == 'today':
       
        etc.etc.. no issues below

Original Q&A

There are 1 best solutions below

bob On 01 May 2023 at 19:02

I solved it with this syntax

 raw_df = wr.s3.read_csv(path=[f'{full_path}/{A}', f'{full_path}/{B}'], use_threads=True)

in this way, it is only reading into the dataframe, just the few objects that I want.

aws wrangler (pandas layer). problem with path to S3 bucket

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in AWS-LAMBDA

Related Questions in AWS-API-GATEWAY

Related Questions in AWS-DATA-WRANGLER

Trending Questions

Popular # Hahtags

Popular Questions