Loading data to Netezza as a list is very slow

1k Views Asked by At

I have about million records in a list that I would like to write to a Netezza table. I have been using executemany() command with pyodbc, which seems to be very slow (I can load much faster if I save the records to Excel and load to Netezza from the excel file). Are there any faster alternatives to loading a list with executemany() command?

PS1: The list is generated by a proprietary DAG in our company, so writing to the list is very fast.

PS2: I have also tried looping executemany() into chunks, with each chunk containing a list with 100 records. It takes approximately 60 seconds to load, which seems very slow.

2

There are 2 best solutions below

0
On

Netezza is good for bulk loads, where executeMany() inserts number of rows in one go. The best way to load millions of rows is "nzload" utility which can be scheduled by vbscript, Excel Macro from Windows or Shell script from Linux.

0
On

From Python I have had great performance loading millions of rows to Netezza using transient external tables. Basically Python creates a CSV file on the local machine, and then tells the ODBC driver to load the CSV file into the remote server.

The simplest example:

SELECT * 
FROM EXTERNAL '/tmp/test.txt'
SAMEAS test_table 
USING (DELIM ',');

Behind the scenes this is equivalent to the nzload command, but it does not require nzload. This worked great for me on Windows where I did not have nzload.

Caveat: be careful with the formatting of the CSV, the values in the CSV, and the options to the command. Netezza gives obscure error messages for invalid values.