Reading XLSB (binary) file with Pandas read_excel using pyxlsb reads empty rows for some xlsb file

4.6k Views Asked by At

I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:

import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')

If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:

import pandas as pd
with open('test.xlsb', 'rb') as data:
    data.seek(0,0)
    df = pd.read_excel(data, engine='pyxlsb')

but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.

with open('test.xlsb','rb') as data:
    data.seek(0,0)
    with open('test_1.xlsb','wb') as outfile:
        outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')

If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.

0

There are 0 best solutions below