I am trying to concatenate a series of excel files with Pandas. I created a for loop to read in all file for a provided directory and I specified the sheet name alongside the columns I wanted to read in. Once I read those files I concatenated them as a DataFrame variable which I named 'list_of_df'.
I have tested my code up to the point where they are concatenated and I can see the error is initialized once I begin to concatenate the various excel sheets. Currently I am getting the error 'File is not a zip file'
import pandas as pd
import glob
df = []
list_of_df = pd.DataFrame()
for filename in all_files:
df = pd.read_excel(filename,
sheet_name = 'cycle',
usecols=['Chg. Cap.(mAh)','DChg. Cap.(mAh)'],
engine = 'openpyxl')
list_of_df = pd.concat([list_of_df. df], ignore_index = True)
print(list_of_df)
with pd.ExcelWriter('output3.xlsx', engine = 'openpyxl', mode = 'a',) as writer:
df.to_excel(writer, sheet_name = 'output3')
This is the error raised:
Traceback (most recent call last):
File ~\AppData\Local\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\documents\python scripts\chg._and_dchg._curves_for_excel5.py:30
df = pd.read_excel(filename, sheet_name = 'cycle', usecols=['Chg. Cap.(mAh)','DChg. Cap.(mAh)'], engine = 'openpyxl')
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\util\_decorators.py:211 in wrapper
return func(*args, **kwargs)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\util\_decorators.py:331 in wrapper
return func(*args, **kwargs)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\excel\_base.py:482 in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\excel\_base.py:1695 in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\excel\_openpyxl.py:557 in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\excel\_base.py:545 in __init__
self.book = self.load_workbook(self.handles.handle)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\excel\_openpyxl.py:568 in load_workbook
return load_workbook(
File ~\AppData\Local\anaconda3\Lib\site-packages\openpyxl\reader\excel.py:315 in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File ~\AppData\Local\anaconda3\Lib\site-packages\openpyxl\reader\excel.py:124 in __init__
self.archive = _validate_archive(fn)
File ~\AppData\Local\anaconda3\Lib\site-packages\openpyxl\reader\excel.py:96 in _validate_archive
archive = ZipFile(filename, 'r')
File ~\AppData\Local\anaconda3\Lib\zipfile.py:1302 in __init__
self._RealGetContents()
File ~\AppData\Local\anaconda3\Lib\zipfile.py:1369 in _RealGetContents
raise BadZipFile("File is not a zip file")
BadZipFile: File is not a zip file
Summarizing the value from the comments:
df = pd.read_excel(filename, sheet_name = 'cycle', usecols=['Chg. Cap.(mAh)','DChg. Cap.(mAh)'], engine = 'openpyx), not the writing after concatenating, as you seemed to believe.File corruption might be too easy a culprit and no evidence was given. The sensitivity tag on the other hand looks well documented.
Based on this, to turn this post into a useful one, both for you and present/future people experiencing the same issue,