How to normalize decimal values while iterating over dataframe rows using toLocalIterator

242 Views Asked by At

I have a pyspark dataframe which contains a decimal column and the schema for that particular decimal column is Decimal(20,8). When I do a df.show() it shows 3.1E-7 as value for the decimal column for a particular row.

Now I am trying to write this dataframe streaming to an avro file using fastavro and for that I am iterating over all the rows using toLocalIterator. When I get to the row with the above value it is containing Decimal('3.10E-7') and this is breaking my avro writer code with below error as this value is resulting the scale as 9 but my avro file is expecting scale as 8

ValueError: Scale provided in schema does not match the decimal

I was able to iterate over each field for every row and wherever it is decimal datatype, I am using normalize method over it and then passing it to the avro writer (Ref: How to reduce scale in python decimal value). This makes the code slower and inefficient I believe. Is there any other better way?

0

There are 0 best solutions below