I have to validate the schema of files in dynamic frame that I am reading from S3 to Glue.
How do I efficiently validate the schema of every record ?
I tried to validate the schema by converting them to dataframe using below: and use jsonschema python library to validate with a schema.
for record in dynamic_frame.toDF().collect():
record = record.asDict()
jsonschema.validate(record,schema)
but the flaw here is if a record has columns
[A:1,B:2,C:3]
and another has
[A:11,B:22,C:33,D:44]
then converting them to dataframe makes the first record as
[A:1,B:2,C:3,D:None]
but orginally the first didn't have that "D" column, which makes schema validation difficult.
Here I want to check datatype changes, additional columns check , length checks in the schema validation
please help me here, any suggestions are welcome. Thanks
I have JSON files in S3.