Data truncation error in aws glue job while transferring data from S3 to Aurora

531 Views Asked by At

I am trying to transfer my data from S3 bucket (address.csv) to AWS Aurora (MySQL) using AWS Glue. When I use the following script for transfer, one of the column named "po_box_number" which is a varchar with length 10 gives me a error saying "An error occurred while calling o195.pyWriteDynamicFrame. Data truncation: Data too long for column 'po_box_number' at row 1". When I increased the size of the column for diagnostic purpose, I saw that the data is stored in json format. Suppose the value that I need is "100", it is stored as {"long": 100, "string": null}, similarly if I try to store "E101", it is stored as {"long": null, "string": "E101"}

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from datetime import datetime
from pyspark.sql.functions import lit
from awsglue.dynamicframe import DynamicFrame

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "db1", table_name = "tb1", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "db1", table_name = "tb1", transformation_ctx = "datasource0")   

#applymapping1 = Map.apply(frame = datasource0, f = AddProcessedTime)

applymapping1 = ApplyMapping.apply(frame = applymapping1, mappings = [("col6", "string", "po_box_number", "string")], transformation_ctx = "applymapping1")

#applymapping1 = ResolveChoice.apply(applymapping1, specs = [("po_box_number", "cast:string")])
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = applymapping1, database = "db1", table_name = "tb2", transformation_ctx = "datasink5")
job.commit()
1

There are 1 best solutions below

1
On

There seems to be some corrupted data in my S3 bucket which was responsible for the conversion into json. As soon as I removed it, everything went as expected