Erratic behaviour when removing whitespace from the end of a string while importing excel table

49 Views Asked by At

I am importing an excel file with whitespaces at the end of most cell content which need removing. The following script works with sample data:

import pandas as pd

def strip(text):
    try:
        return text.strip()
    except AttributeError:
        return text

def num_strip(text):
    try:
        return text.split(" ",1)[0]
    except AttributeError:
        return text

def parse_excel_sheet(input_file, sheet):
    df = pd.read_excel(
        input_file,
        sheetname= sheet,
        parse_cols = 'A,B,C',            
        names=['ID', 'name_ITA', 'name_ENG'],
        converters = {
            'ID' : num_strip,
            'name1' : strip,
            'name2' : strip,
            }
        )
    return df

file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/excel_test.xlsx'
df = parse_excel_sheet(file,'1')
print(df)

however when trying the script on a larger file, parsing the first column 'ID' does not remove whitespaces.

file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/DRS_IL_startingpoint.xlsx'
df = parse_excel_sheet(file,'test')
print(df)
1

There are 1 best solutions below

3
On

I just run your code and found that whitespaces were correctly removed from column 'ID' in larger file:

for i, el in enumerate(df['ID'].values):
# print(i)
if " " in el:
    print(el)

returns no element from 'ID' column: there's no whitespace in these 28 elements. How did you checked that this was not the case?