PySpark Performance slow in Reading large fixed width file with long lines to convert to structural

105 Views Asked by At

I am trying to convert bit large file 34GB fixed width file into structural format using pySpark, But my job taking too long to complete (Almost 10 hr+), File having large line almost 50K characters which I am trying to split using substring into around 5k columns, and storing it into parquet format table. if anyone faced similar issues and resolved, any suggestion are greatly appreciated. We have Spark 3.1.1 running through google's Spark Kubernetes Operator on Openshift cluster.

0

There are 0 best solutions below