I have a pyspark dataframe which contains a decimal column and the schema for that particular decimal column is Decimal(20,8)
. When I do a df.show()
it shows 3.1E-7
as value for the decimal column for a particular row.
Now I am trying to write this dataframe streaming to an avro file using fastavro and for that I am iterating over all the rows using toLocalIterator
. When I get to the row with the above value it is containing Decimal('3.10E-7')
and this is breaking my avro writer code with below error as this value is resulting the scale as 9 but my avro file is expecting scale as 8
ValueError: Scale provided in schema does not match the decimal
I was able to iterate over each field for every row and wherever it is decimal datatype, I am using normalize
method over it and then passing it to the avro writer (Ref: How to reduce scale in python decimal value). This makes the code slower and inefficient I believe. Is there any other better way?