How to handle typos when cleansing a text file?

179 Views Asked by At

I'm trying to clean a text file in python. I noticed the text file I'm reading in has several typos (ie. chevroelt instead of chevrolet). I have a specific list of typos that I'd like to address. How would I approach making these edits as I read in an input file to a new (clean) output file? Below is the code I have written to read in the original text file and output to a new (clean) file. I appreciate any help in advance!

    def _clean_data(self):
        ifname = AutoMPGData.DATA_FILE_ORIG
        ofname = AutoMPGData.DATA_FILE_CLEAN
        with open(ifname, 'r') as ifile:
            with open(ofname, 'w') as ofile:
                for line in ifile:
                    ofile.write(line.expandtabs()) 
1

There are 1 best solutions below

0
On

If you have a list of specific issues you'd like to address, I would create a map (tuple?) of all words with typo as key and the correct spelling as value, then something like this (pseudocode):

for each word in file:
    if word is in keys:
        word = key.value