Python pandas 'to_sql' throws 'Invalid precision value' error with SQL Server

365 Views Asked by At

I run a Python script weekly that generates a .csv file, which I then manually upload to a Microsoft SQL server. I'm now trying to automate this process by using the pandas to_sql function. However, when I try to do so, I encounter the following error: "('HY104', '[HY104] [Microsoft][ODBC SQL Server Driver]Invalid precision value (0) (SQLBindParameter)')".

To try and identify the source of the problem, I simplified the process and tried to upload a dataframe with a single column to a new table with three columns:

  1. Id (Datatype: INT, Incremental Number, Non-Nullable, Primary Key)
  2. RecordDate (Datatype: datetime, Default Value: Getdate(), Non-Nullable)
  3. MasterPartNo (Datatype: varchar(35), Non-Nullable)

Here is the Python code I used to do this:

import pandas as pd
import Resources.SQL.Connection.Sql_Connection as sqlCon
from sqlalchemy.exc import DBAPIError
import sqlalchemy
import pyodbc
import functions.Output.DataframePrint as dataPrint


def sql_upload(upload_dataframe):
    try:
        upload_dataframe.to_sql(
            name='test_table',
            con=sqlCon.engine_db2,
            schema='dbo',
            if_exists='append',
            index=False
        )
    except DBAPIError as err:
        print(err.orig)
        print(err.statement)
        print(err.params)
        print("SQLAlchemy version:", sqlalchemy.__version__)
        print("pyodbc version:", pyodbc.version)
        print("pandas version:", pd.__version__)


test_data = {'MasterPartNo': ['TEST-TEST']}
dataset = pd.DataFrame(data=test_data)
dataset['MasterPartNo'] = dataset['MasterPartNo'].astype(str(35))

dataPrint.print_dataframe(dataframe=dataset)  # Formats and prints dataframe on the console.

print(dataset.dtypes)

sql_upload(upload_dataframe=dataset)

When running this code, I get the dataset printed as expected and then the following error message:

('HY104', '[HY104] [Microsoft][ODBC SQL Server Driver]Invalid precision value (0) (SQLBindParameter)')
SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE ([INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) OR [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max))) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max))
('BASE TABLE', 'VIEW', 'test_table', 'dbo')
SQLAlchemy version: 2.0.19
pyodbc version: 4.0.39
pandas version: 2.0.3

I have tested the connection and I am able to query information out of that database without problems. I'm using PyCharm, SQLAlchemy with an ODBC connection.

Any help would be greatly appreciated.

1

There are 1 best solutions below

0
Douglas Cady On
  1. I added data types when calling the to_sql function
  2. I added a '#' in front of the table name #test_table to indicate that it's a temp table, though this may not be necessary in your case.

Check this list of SQL Alchemy data types to know what to use for your situation.

I am connecting to SQL Server through pyodbc, like in your example.

""" 
Test Inserting a DataFrame into SQL Server database

"""

from sqlalchemy import create_engine, engine
from sqlalchemy.types import String, DateTime, Integer
import pandas as pd


def start_db_engine() -> engine.base.Engine:
    connection_url = f"mssql+pyodbc://{username}:{password}@DSN"
    return create_engine(connection_url)


def main() -> int:
    engine = start_db_engine()
    
    with engine.connect() as conn:
        dataset = pd.DataFrame({
            'Id': [4, 5, 6],
            'RecordDate': ['08/23/2023', '08/24/2023', '08/25/2023'],
            'MasterPartNo': ['part1', 'part2', 'part3']
        })
        dataset['RecordDate'] = pd.to_datetime(dataset['RecordDate'])
        
        dtypes = {'Id': Integer, 
                  'RecordDate': DateTime, 
                  'MasterPartNo': String(35)}
        dataset.to_sql(name="#test_table", con=conn, if_exists='replace', index=False, dtype=dtypes)
        
        test_table = pd.read_sql_query("select * from #test_table", conn)
        print(test_table)
        print(dataset.equals(test_table))
        

if __name__ == '__main__':
    main()