I have a spreadsheet which is in format of Microsoft Excel 97-2003 xls. I tried following:
import pandas as pd
xlsx_file_path = "C:/temp/a_file.xls"
sheets_dict = pd.read_excel(xlsx_file_path, engine='xlrd', sheet_name=None)
for sheet_name, df_in in sheets_dict.items():
print(sheet_name)
It gives error:
File C:\xxxxxx\site-packages\xlrd\__init__.py:172 in open_workbook
bk = open_workbook_xls(
File C:\xxxxxxx\site-packages\xlrd\book.py:79 in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File C:\xxxxxxxx\site-packages\xlrd\book.py:1284 in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File C:\xxxxxxxx\site-packages\xlrd\book.py:1278 in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf\xef\xbb\xbf<?'
I tried other engines such as openpyxl, and got following error:
File C:\xxxx\lib\zipfile.py:1336 in _RealGetContents
raise BadZipFile("File is not a zip file")
BadZipFile: File is not a zip file
Is there any workaround?
The XLS file is a birany file that is not in the Zipped file format. Because that you have an ZIP error when you're using the openpyxl engine. You can leave the engine empty and the pandas select to you.
Now checking the Microsoft Excel 97-2003 XML spreadsheet problem, I've developed an reader based on this documentation https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
@edited 1 Based on the error, you're problaby have an CSV file named as as XLS file. try to change the read method to
@edited 2 After receiving the link to the specific file, I have enhanced the response by developing an MS-XML 2003 Spreadsheet reader. Additionally, I have conducted some cleanup on the externally generated XML file. As a result, the code is now compatible with various file formats such as XLSX, XLS, MS-XML, or CSV. You may input the spreadsheets along with the desired initial line for importing into the pandas DataFrame.