Read TXT or DAT file in Python

1.6k Views Asked by At

I need to read a .DAT or .TXT file, extract the column names and assign them to new names and write the data to a pandas dataframe.

I have an environment variable called 'filetype' and based on it's value(DAT or TXT), I need to read the file accordingly and extract column names from it and assign to new column names.

My input .dat/.txt file has just 2 columns and it looks like as below:

LN_ID,LN_DT

1234,10/01/2020

4567,10/01/2020

8888,10/01/2020

9999,10/01/2020

Read the above file and create new columns new_loan_id=loan_id and new_ln_dt=ln_dt and write to a pandas dataframe

I've tried using pandas something like below but it's giving some error and I also want to check first if myfile is .dat or .txt based on the environment variable 'filetype' value and proceed.

df=pd.read_csv('myfile.dat',sep=',')

new_cols=['new_ln_id','new_ln_dt']

df.columns=new_cols

I think there could be some better and easy way. Appreciate if anyone can help. Thanks!

1

There are 1 best solutions below

2
Serge de Gosson de Varennes On

It is unclear from your question whether you want two new empty columns or if you want to replace the existing names. Either way, you can do this for dte given by:

Add columns

  LN_ID       LN_DT
0   1234  10/01/2020
1   4567  10/01/2020
2   8888  10/01/2020
3   9999  10/01/2020

define the new columns

cols = ['new_ln_id','new_ln_dt']

and `

print(pd.concat([dte,pd.DataFrame(columns=cols)]))

which gives

    LN_ID       LN_DT new_ln_id new_ln_dt
0  1234.0  10/01/2020       NaN       NaN
1  4567.0  10/01/2020       NaN       NaN
2  8888.0  10/01/2020       NaN       NaN
3  9999.0  10/01/2020       NaN       NaN

Replace column names

df.rename(columns={"LN_ID": "new_ln_id", "LN_DT": "new_ln_dt"})

Thanks for your response and Sorry for the confusion. I want to rename the 2 columns. But, actually, I want to check first whether it's a .dat or .txt file based on unix environment variable called 'filetype'.

For ex: if filetype='TXT' or 'DAT' then read the input file say 'abc.dat' or 'abc.txt' into a new pandas dataframe and rename the 2 columns. I hope it's clear.

Here is what I did. I've created a function to check if the filetype is "dat" or "txt" and read the file into a pandas dataframe and then I'm renaming the 2 columns. The function is loading the data but it's not renaming the columns as required. Appreciate if anyone can point me what am I missing.

filetype=os.environ['TYPE']
print(filetype)
DAT

    def load(file_type):
        if file_type.lower()=="dat":
            df=pd.read_csv(input_file, sep=',',engine='python')
            if df.columns[0]=="LN_ID":
                df.columns[0]="new_ln_id"
            if df.columns[1]=="LN_DT":
                df.columns[1]="new_ln_dt"
            return(df)
        else:
            if file_type.lower()=="txt":
                df=pd.read_csv("infile",sep=",",engine='python')
                if df.columns[0]=="LN_ID":
                    df.columns[0]="new_ln_id"
                if df.columns[1]=="LN_DT":
                    df.columns[1]="new_ln_dt"
            return(df)
    
    load(filetype)

Alternative

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
filename = os.path.join(path, onlyfiles[0])
if filename.endswith('.txt'):
    dte = pd.read_csv(filename, sep=",")
elif filename.endswith('.dat'):
    dte = pd.read_csv(filename, sep=",")
    
dte.rename(columns={"LN_ID": "new_ln_id", "LN_DT": "new_ln_dt"})