I have upgraded pandas package, new version is : 1.4.2 and xlrd package, new version is 2.0.1
Now, when I read python file with the following command :
import pandas as pd pd.read_excel('myfile.xlsx')
I got the following warning: UserWarning: Workbook contains no default style, apply openpyxl's default
And my result is :
foo | bar |
---|---|
_x0031_01259 | COMMUNAUTE_x0020_DE_x0020_COMMUNES_x0020_FIER_x0020_ET_x0020_USSES |
While it should be :
foo | bar |
---|---|
101259 | COMMUNAUTE DE COMMUNES FIER ET USSES |
So in some columns, 1 is replaced by x0031, 2 is replaced by x0033, space is replaced by x0020, etc.
I tried to add engine parameter set to openpyxl but same warning message and same dataframe result.
Before Pandas package upgrade, I already had the problem but with engine parameter set to xlrd, it was working (but I had a warning saying that newest version won't support xlrd)
Any idea how to read correctly the file?
xlrd
package is not used for.xlsx
files inpandas
.As you can read in pandas docs, for
.xlsx
it is usingopenpyxl
that does not handle readingxlsx
properly in some specific scenarios, causing error you have described - check this bug reportIt has been working for you before, because
xlrd
used to support.xlsx
files, but since version 2.0.0 it supports only "classic".xls
files.As a workaround, you can use
xlrd
code snippet from 1.x.x version in pandas, however you have to ensure your columns does not include strings like_x0000_
that you want to preserve: