Read .xls file with Python pandas read_excel not working, says it is a .xlsb file

2.5k Views Asked by At

I'm trying to read several .xls files, saved on a NAS folder, with Apache Airflow, using the read_excel python pandas function.

This is the code I'm using:

df = pd.read_excel('folder/sub_folder_1/sub_folder_2/file_name.xls', sheet_name=April, usecols=[0,1,2,3], dtype=str, engine='xlrd')

This worked for a time, but recently I have been getting this error for several of those files:

Excel 2007 xlsb file; not supported

[...]

xlrd.biffh.XLRDError: Excel 2007 xlsb file; not supported

These files are clearly .xls files, yet my code seems to detect them as .xlsb files, which are not supported. I would prefer a way to specify they are .xls file, or alternatively, a way to read xlsb files.

Not sure if this is relevant, but these files are updated by an external team, who may have modified some parameter of these files without me knowing so, but I think that if this was the case, I would be getting a different error.

1

There are 1 best solutions below

1
On

Try:

import openpyxl

xls = pd.ExcelFile('data.xls', engine='openpyxl')
df = pd.read_excel(xls)

XLRD has removed the ability to read in some excel datatypes recently like xlxs