I am trying to fetch json files from a S3 prefix into Glue DynamicFrame. What I also want is to add column for file name to identify the each record source file. I am trying like below using option attachFilename -
def read_data(self) -> DynamicFrame:
dyf = self.glue_context.create_dynamic_frame.from_options(
connection_type="s3",
connection_options={
"paths": [f"s3://{self.args['SOURCE_S3_BUCKET']}/{self.args['SOURCE_S3_KEY']}"],
"recurse": True
},
format="json",
format_options={
"jsonPath": "$",
"multiline": True,
"attachFilename": "source_file_name"
},
transformation_ctx=f"extract_data"
)
print("dyf")
print(dyf.show(2))
return dyf
But getting below error -
An error occurred while calling o118.toDF. source_file_name already exists
I have tried changing the column name for attachFilename but still getting error for any column name. Can someone please help me to identify what I am doing wrong here ?