I need to extract data from DynamoDB to S3 using Kinesis Data Stream and Firehose stream, converting them into parquets. I'm having trouble setting up the transformation to parquets in Firehose Stream because there I need to select a Glue Table with the schema of my data coming into Firehose. I've tried many things, the only thing that worked for me is when the data is simply None, or they come not empty but not what I need. There is another option to use Lambda to convert them to parquet, but I still need to set up this schema.
These are my data coming into Firehose:
{"awsRegion":"us-west-2","eventID":"ebb59b0b-247d-49cc-83fc-ccc1988481ed","eventName":"INSERT","userIdentity":null,"recordFormat":"application/json","tableName":"table-oleg","dynamodb":{"ApproximateCreationDateTime":1710435429661487,"Keys":{"user_id":{"S":"nvbssdfbdfbsdvsdv"}},"NewImage":{"autonomy":{"S":"ssnvbnbdfbdfsdvsdv"},"user_id":{"S":"nvbssdfbdfbsdvsdv"}},"SizeBytes":74,"ApproximateCreationDateTimePrecision":"MICROSECOND"},"eventSource":"aws:dynamodb"}{"awsRegion":"us-west-2","eventID":"0f30f744-2d94-4494-bd55-e0278942cccb","eventName":"INSERT","userIdentity":null,"recordFormat":"application/json","tableName":"table-oleg","dynamodb":{"ApproximateCreationDateTime":1710435451018854,"Keys":{"user_id":{"S":"nvbssdfbdfbSVSV"}},"NewImage":{"autonomy":{"S":"ssnvbnbdfbdfsEV"},"user_id":{"S":"nvbssdfbdfbSVSV"}},"SizeBytes":67,"ApproximateCreationDateTimePrecision":"MICROSECOND"},"eventSource":"aws:dynamodb"}
I need these two columns with their values to be uploaded to S3:
autonomy: ssnvbnbdfbdfsEV
user_id: nvbssdfbdfbSVSV
These are my Firehose stream settings for conversion to Parquet
This is my JSON schema in the Glue table
These are my Glue table settings
Thank you in advance for your help!
I've tried using many schema examples found on the internet. I also attempted the example from the official AWS website:
{
"$id": "https://example.com/person.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
But so far without success...