I am importing an excel file with whitespaces at the end of most cell content which need removing. The following script works with sample data:
import pandas as pd
def strip(text):
try:
return text.strip()
except AttributeError:
return text
def num_strip(text):
try:
return text.split(" ",1)[0]
except AttributeError:
return text
def parse_excel_sheet(input_file, sheet):
df = pd.read_excel(
input_file,
sheetname= sheet,
parse_cols = 'A,B,C',
names=['ID', 'name_ITA', 'name_ENG'],
converters = {
'ID' : num_strip,
'name1' : strip,
'name2' : strip,
}
)
return df
file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/excel_test.xlsx'
df = parse_excel_sheet(file,'1')
print(df)
however when trying the script on a larger file, parsing the first column 'ID'
does not remove whitespaces.
file = 'http://www.camminiepercorsi.com/wp-content/uploads/excel_test/DRS_IL_startingpoint.xlsx'
df = parse_excel_sheet(file,'test')
print(df)
I just run your code and found that whitespaces were correctly removed from column 'ID' in larger file:
returns no element from 'ID' column: there's no whitespace in these 28 elements. How did you checked that this was not the case?