Spark reading in fixed width file

2.8k Views Asked by Dan At 06 June 2025 at 05:29

I'm new to Spark (less than 1 month!) and am working with a flat file raw data input that is fixed width. I am using sqlContext to read in the file using com.databricks.spark.csv and then using .withColumn to substring the rows based on the set widths.

    rawData.withColumn("ID"), trim(rawData['c0'].substr(1,8)))

The issue I am encountering is that the last field is of variable width. It has a fixed start point but variable number of 'sets' of data that are like 20 chars wide. So for example

Row 1  A 1243 B 42225 C 23213 
Row 2  A 12425
Row 3  A 111 B 2222 C 3 D 4 E55555

I need to eventually read in those variable fields, just pull out the first character of each group in the variable width column, and then transpose so that the output looks like:

Row 1 A
Row 1 B
Row 1 C
Row 2 A
...
Row 3 D
Row 3 E

I've read in the fixed width columns I need but I am stuck at the variable width field.

Original Q&A

There are 1 best solutions below

Bhargav Kosaraju On 02 April 2017 at 02:17 BEST ANSWER

zipWithIndex and explode can help to transpose the data into rows of each element

sc.textFile ("csv.data").map(_.split("\\s+")).zipWithIndex.toDF("dataArray","rowId").select ($"rowId",explode($"dataArray")).show(false)

+-----+------+
|rowId|col   |
+-----+------+
|0    |A     |
|0    |1243  |
|0    |B     |
|0    |42225 |
|0    |C     |
|0    |23213 |
|1    |A     |
|1    |12425 |
|2    |A     |
|2    |111   |

Spark reading in fixed width file

There are 1 best solutions below

Related Questions in CSV

Related Questions in APACHE-SPARK

Related Questions in FIXED-WIDTH

Trending Questions

Popular # Hahtags

Popular Questions