column order relevant in df.to_sql(if_exists="append")?

54 Views Asked by At

I was wondering, whether df.to_sql("db_table", con=some_engine, if_exists="append") requires that the order of the columns of df (a dataframe) and db_table (a database table) has to be the same. There is nothing about it in the documentation, so I tried it out. The result is that the order doesn't have to be the same. However, where can I find out whether the order has to match, without trying an example code? How can I find the location where it is revealed what if_exists="append" does (for example, in PyCharm)?

1

There are 1 best solutions below

0
Parfait On

According to underlying pandas code for the SQL IO module, insertion of records are handled in an SQLAchemy Connection.execute call with parameters as a list of dictionaries:

def _execute_insert(self, conn, keys: list[str], data_iter) -> int:
    """
    Execute SQL statement inserting data

    Parameters
    ----------
    conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
    keys : list of str
       Column names
    data_iter : generator of list
       Each item contains a list of values to be inserted
    """
    data = [dict(zip(keys, row)) for row in data_iter]
    result = conn.execute(self.table.insert(), data)
    return result.rowcount

Additionally, the self.table.insert() noticeably omits the values() call where SQLAlchemy insert docs indicate values clause is dynamically rendered based on parameters passed within Connection.execute:

values – collection of values to be inserted; see Insert.values() for a description of allowed formats here. Can be omitted entirely; a Insert construct will also dynamically render the VALUES clause at execution time based on the parameters passed to Connection.execute().

Altogether, the DataFrame.to_sql uses named parameters (not positional parameters using ? or %s placeholders) in underlying insert-values SQL query. Therefore, order of DataFrame columns does not matter since the execution will dynamically align key-value pairs to corresponding columns of the database SQLAlchemy Table.