I am parsing a PDF with tabula-py, and I need to ignore the first two tables, but then parse the rest of the tables as one, and export to a CSV. On the first relevant table (index 2) the first row is a header-row, and I want to leave this out of the csv.
See my code below, including my attempt at dropping the relevant row from the Pandas frame.
What is the easiest/most elegant way of achieving this?
tables = tabula.read_pdf('input.pdf', pages='all', multiple_tables=True)
f = open('output.csv', 'w')
# tables[2].drop(index=0) # tried this, but makes no difference
for df in tables[2:]:
df.to_csv(f, index=False, sep=';')
f.close()
Given the following toy dataframes:
You can drop the first row of the second dataframe either by
reassigning
the resulting dataframe (preferable way):Or
inplace
:And so, in both cases: