How to handle NaTs with pandas sqlalchemy and psycopg2

4.8k Views Asked by At

I have a dataframe with NaTs like so that is giving me a DataError: (psycopg2.DataError) invalid input syntax for type timestamp: "NaT": When I try inserting the values into a postgres db

The dataframe

from sqlalchemy import MetaData
from sqlalchemy.dialects.postgresql import insert
import pandas as pd

tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
                  'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
                  'colC' : ['a1','b2','c3','a4','z5', 'q6']})
tst_df.loc[5, 'colB'] = pd.NaT



insrt_vals = tst_df.to_dict(orient='records')
engine = sqlalchemy.create_engine("postgresql://user:password@localhost/postgres")
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
table = meta.tables['tstbl']
insrt_stmnt = insert(table).values(insrt_vals)

do_nothing_stmt  = insrt_stmnt.on_conflict_do_nothing(index_elements=['colA','colB'])

The code generating the error

results = engine.execute(do_nothing_stmt)

DataError: (psycopg2.DataError) invalid input syntax for type timestamp: "NaT"
LINE 1: ...6-12-18T09:54:05.046965'::timestamp, 'z5'), ('q', 'NaT'::tim...

One possibility mentioned here is to replace the NaT's with None's but as the previous author said it seems a bit hackish.

sqlachemy  1.1.4
pandas  0.19.1
psycopg2 2.6.2 (dt dec pq3 ext lo64)
1

There are 1 best solutions below

1
On

Did you try to use Pandas to_sql method?

It works for me for the MySQL DB (I presume it'll also work for PostgreSQL):

In [50]: tst_df
Out[50]:
  colA                       colB colC
0    a 2016-12-14 19:11:36.045455   a1
1    b 2016-12-15 19:11:36.045455   b2
2    c 2016-12-16 19:11:36.045455   c3
3    a 2016-12-17 19:11:36.045455   a4
4    z 2016-12-18 19:11:36.045455   z5
5    q                        NaT   q6

In [51]: import pymysql
    ...: import sqlalchemy as sa
    ...:

In [52]:

In [52]: db_connection = 'mysql+pymysql://user:password@mysqlhost/db_name'
    ...:

In [53]: engine = sa.create_engine(db_connection)
    ...: conn = engine.connect()
    ...:

In [54]: tst_df.to_sql('zzz', conn, if_exists='replace', index=False)

On the MySQL side:

mysql> select * from zzz;
+------+---------------------+------+
| colA | colB                | colC |
+------+---------------------+------+
| a    | 2016-12-14 19:11:36 | a1   |
| b    | 2016-12-15 19:11:36 | b2   |
| c    | 2016-12-16 19:11:36 | c3   |
| a    | 2016-12-17 19:11:36 | a4   |
| z    | 2016-12-18 19:11:36 | z5   |
| q    | NULL                | q6   |
+------+---------------------+------+
6 rows in set (0.00 sec)

PS unfortunately i don't have PostgreSQL for testing