Read TXT or DAT file in Python

1.6k Views Asked by At

I need to read a .DAT or .TXT file, extract the column names and assign them to new names and write the data to a pandas dataframe.

I have an environment variable called 'filetype' and based on it's value(DAT or TXT), I need to read the file accordingly and extract column names from it and assign to new column names.

My input .dat/.txt file has just 2 columns and it looks like as below:

LN_ID,LN_DT

1234,10/01/2020

4567,10/01/2020

8888,10/01/2020

9999,10/01/2020

Read the above file and create new columns new_loan_id=loan_id and new_ln_dt=ln_dt and write to a pandas dataframe

I've tried using pandas something like below but it's giving some error and I also want to check first if myfile is .dat or .txt based on the environment variable 'filetype' value and proceed.

df=pd.read_csv('myfile.dat',sep=',')

new_cols=['new_ln_id','new_ln_dt']

df.columns=new_cols

I think there could be some better and easy way. Appreciate if anyone can help. Thanks!

1

There are 1 best solutions below

2
On

It is unclear from your question whether you want two new empty columns or if you want to replace the existing names. Either way, you can do this for dte given by:

Add columns

  LN_ID       LN_DT
0   1234  10/01/2020
1   4567  10/01/2020
2   8888  10/01/2020
3   9999  10/01/2020

define the new columns

cols = ['new_ln_id','new_ln_dt']

and `

print(pd.concat([dte,pd.DataFrame(columns=cols)]))

which gives

    LN_ID       LN_DT new_ln_id new_ln_dt
0  1234.0  10/01/2020       NaN       NaN
1  4567.0  10/01/2020       NaN       NaN
2  8888.0  10/01/2020       NaN       NaN
3  9999.0  10/01/2020       NaN       NaN

Replace column names

df.rename(columns={"LN_ID": "new_ln_id", "LN_DT": "new_ln_dt"})

Thanks for your response and Sorry for the confusion. I want to rename the 2 columns. But, actually, I want to check first whether it's a .dat or .txt file based on unix environment variable called 'filetype'.

For ex: if filetype='TXT' or 'DAT' then read the input file say 'abc.dat' or 'abc.txt' into a new pandas dataframe and rename the 2 columns. I hope it's clear.

Here is what I did. I've created a function to check if the filetype is "dat" or "txt" and read the file into a pandas dataframe and then I'm renaming the 2 columns. The function is loading the data but it's not renaming the columns as required. Appreciate if anyone can point me what am I missing.

filetype=os.environ['TYPE']
print(filetype)
DAT

    def load(file_type):
        if file_type.lower()=="dat":
            df=pd.read_csv(input_file, sep=',',engine='python')
            if df.columns[0]=="LN_ID":
                df.columns[0]="new_ln_id"
            if df.columns[1]=="LN_DT":
                df.columns[1]="new_ln_dt"
            return(df)
        else:
            if file_type.lower()=="txt":
                df=pd.read_csv("infile",sep=",",engine='python')
                if df.columns[0]=="LN_ID":
                    df.columns[0]="new_ln_id"
                if df.columns[1]=="LN_DT":
                    df.columns[1]="new_ln_dt"
            return(df)
    
    load(filetype)

Alternative

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
filename = os.path.join(path, onlyfiles[0])
if filename.endswith('.txt'):
    dte = pd.read_csv(filename, sep=",")
elif filename.endswith('.dat'):
    dte = pd.read_csv(filename, sep=",")
    
dte.rename(columns={"LN_ID": "new_ln_id", "LN_DT": "new_ln_dt"})