Fixed Length file Reading Spark with multiple Records format in one

287 Views Asked by freakabhi At 29 July 2025 at 05:39

All,

I am trying to read the file with multiple record types in spark, but have no clue how to do it.. Can someone point out, if there is a way to do it? or some existing packages? or some user git packages

the example below - where we have a text file with 2 separate ( it could be more than 2 ) record type : 00X - record_ind | First_name| Last_name

0-3 record_ind
4-10 firstname
11-16 lastname
============================
00Y - record_ind | Account_#| STATE | country
0-3 record_ind
4-8 Account #
9-10 STATE
11-15 country

input.txt
------------

    00XAtun   Varma 
    00Y00235ILUSA   
    00XDivya  Reddy  
    00Y00234FLCANDA  
    
    sample output/data frame
    output.txt
    
    record_ind | x_First_name | x_Last_name | y_Account | y_STATE | y_country
    ---------------------------------------------------------------------------
      00x      | Atun         | Varma       | null      | null    | null
      00y      | null         | null        | 00235     | IL      | USA       
      00x      | Divya        | Reddy       | null      | null    | null
      00y      | null         | null        | 00234     | FL      | CANDA

Original Q&A

There are 1 best solutions below

Relic16 On 17 October 2020 at 00:19

One way to achieve this is to load data as 'text'. Complete row will be loaded inside one column named 'value'. Now call a UDF which modifies each row based on condition and transform the data in way that all row follow same schema. At last, use schema to create required dataframe and save in database.

Fixed Length file Reading Spark with multiple Records format in one

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in FIXED-WIDTH

Trending Questions

Popular # Hahtags

Popular Questions