Automatically ingest Data from S3 into Amazon Timestream?

2k Views Asked by At

What is the easiest way to automatically ingest csv-data from a S3 bucket into a Timestream database ?

I have a s3-bucket which continuasly is generating csv files inside a folder structure. I want to save these files inside a timestream database so i can visualize them inside my grafana instance.

I already tried to do that via a Glue crawler but that wont wont for me. Is there any workaround or tutorial on how to solve this task ?

2

There are 2 best solutions below

1
On

I do this using a Lambda function, an SNS topic and a queue.

New files in my bucket triggers a notification on an SNS Topic

The notification gets added to an SQS queue.

The lambda function consumes the queue, recovers the bucket and key of the new s3 object, downloads the csv file, does some processing and ingests the data into timestream. The lambda is implemented in Python.

This has been working ok, with the caveat that large files may not ingest fully within the lambda 15 minute limit. Timestream is not super fast. It gets better by using multi-valued records, as well as using the "common attributes' feature of the timestream client in boto3.

(it should be noted that the lambda can be triggered directly by the S3 bucket, if one prefers. Using a queue allows a bit more flexibility, such as being able to manually add files to the queue for reprocessing)

0
On

There is now a feature called batch load. You can ingest CSV files into timestream.

You can read here about it