Replacing a Character in .csv file only for specific strings

510 Views Asked by At

I am trying to clean a file and have removed the majority of unnecessary data excluding this one issue. The file I am cleaning is made up of rows containing numbers, see below example of a few rows.

[Example of data][1] [1]: https://i.stack.imgur.com/0bADX.png

You can see that I have cleaned the data so that there is a space between each character aside from the four characters that start each row. There are some character groupings that I have not yet added a space between each character because I need to replace the "1"s with a space rather than keeping the "1"s.

Strings I still need to clean2: https://i.stack.imgur.com/gmeUs.png

I have tried the following two methods in order to replace the 1's in these specific strings, but both produce results that I do not want.

Method 1 - Replacing 1's before splitting characters into their own columns

Data2 = pd.read_csv(filename.csv)
Data2['Column']=Data2['Column'].apply(lambda x: x.replace('1',' ') if len(x)>4 else x)

This method results in the replacement of every 1 in the entire file, not just the 1's in the strings like those pictured above (formatted like "8181818"). I would think that the if statement would excluded the removal of the 1's where there are less than 4 characters grouped together.

Method 2 - Replacing 1's after splitting characters into their own columns

Since Method 1 was resulting in the removal of each 1 in the file, I figured I could split each string into its own column (essentially using the spaces as a delimiter) and then try a similar method to clean these unnecessary 1's by focusing on the specific columns where these strings are located (columns 89, 951, and 961).

Data2[89]=Data2[89].apply(lambda x: x.replace('1',' ') if len(x)!=1 else x)
Data2[89].str.split(' ').tolist()
Data2[89] = pd.DataFrame(Data2[89].str.split(' ').tolist())
Data2[951]=Data2[951].apply(lambda x: x.replace('1',' ') if len(x)!=1 else x)
Data2[951].str.split(' ').tolist()
Data2[951] = pd.DataFrame(Data2[951].str.split(' ').tolist())
Data2[961]=Data2[961].apply(lambda x: x.replace('1',' ') if len(x)!=1 else x)
Data2[961].str.split(' ').tolist()
Data2[961] = pd.DataFrame(Data2[961].str.split(' ').tolist())

This method successfully removed only the 1's in these strings, but when I am then splitting the numbers I am keeping from these strings into their own columns they are overwriting the existing values in those columns rather than pushing those existing values into columns further down the line.

Any assistance on either of these methods or advice on if there is a different approach I should be taking would be much appreciated.

0

There are 0 best solutions below