How do I make sure not only the last element of a for loop dataframe is posted via to_sql to the database?

31 Views Asked by At

I try to post the dataframe to pgadmin. The dataframe was edited in a for loop, but when I post the data it posts only the last element in the for loop. I tried to use the to_sql inside the for loop and outside.

tables = camelot.read_pdf(pdf_path, pages = 'all', flavor = 'stream')
    for table in tables[:-1]:
         df = table.df
         df3 = pd.DataFrame()
         df3['nr'] = df2[0].values.astype(str).flatten().tolist()
         df3.to_sql('new_pilsCorner4', engine,if_exists='replace')
         engine = create_engine(f'postgresxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

I tried to append the dataframe to an array and get a list, which I push into another dataframe outside the loop but that didn't work because of a missmatch between the columns.

1

There are 1 best solutions below

0
Serge de Gosson de Varennes On BEST ANSWER

You use replace which means you are ovrwritting the content in each run of your loop.

Do either this:

tables = camelot.read_pdf(pdf_path, pages = 'all', flavor = 'stream')
    for table in tables[:-1]:
         df = table.df
         df3 = pd.DataFrame()
         df3['nr'] = df2[0].values.astype(str).flatten().tolist()
         df3.to_sql('new_pilsCorner4', engine,if_exists='append')
         engine = create_engine(f'postgresxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

or

tables = camelot.read_pdf(pdf_path, pages='all', flavor='stream')
engine = create_engine('postgresxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
for table in tables[:-1]:
    df = table.df
    df3 = pd.DataFrame()
    df3['nr'] = df2[0].values.astype(str).flatten().tolist()
    df3.to_sql('new_pilsCorner4', engine, if_exists='append', index=False)