Error reading xls file with pd.read_excel: ValueError: Excel file format cannot be determined, you must specify an engine manually

2k Views Asked by At

I tried run a simple code reading a xls file and it returned the error in the title

when i try run this code:

import pandas as pd
from datetime import *
import os

print('Reading Sheets')
df_ccc_lll_22 = pd.read_excel(r'U:\OOO\CCCO\PPP CCC HHH.xls')

it returns this error:

Exception has occurred: ValueError
Excel file format cannot be determined, you must specify an engine manually.
  File "U:\OOO\CCCO\teste.py", line 15, in <module>
    df_ccc_lll_22 = pd.read_excel(r'U:\OOO\CCCO\PPP CCC HHH.xls')
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Excel file format cannot be determined, you must specify an engine manually.

I don't know what to do, i tried openpyxl and it also returns an error saying the file is not Zip. xlrd didnt work as well

1

There are 1 best solutions below

0
Yuri R On

Regarding xls extension, the library typically used is xlrd.

And you have to specify xlrd as the engine when using read_excel

pip install xlrd==1.2.0
df_ccc_lll_22 = pd.read_excel(r'U:\OOO\CCCO\PPP CCC HHH.xls', engine='xlrd')

If the file is not a genuine .xls file (even if it has an .xls extension), you may need to open it in Excel (or another spreadsheet program) and save it as an .xlsx file. Then you can read it using the openpyxl engine:

df_ccc_lll_22 = pd.read_excel(r'U:\OOO\CCCO\PPP CCC HHH.xlsx', engine='openpyxl')