Insert into in vertica performance tuning

1.3k Views Asked by At

I am very new to vertica. I have written a simple Insert into statement, but its taking hours long to execute for 1 million records.

like:

insert into abc(a,b,c)
select x,y,z from xyz
where x > 20;

is there anyway to improve the performance? there is sequencing in place for the target table.

Thanks in advance!!

2

There are 2 best solutions below

1
On
0
On

Based on the limited info, I'll give you two areas to look into (1 and 2 are the same area, just different alternatives depending on what you are doing).

  1. If you are using JDBC or ODBC, be sure to use batch sql. As long as you are using the JDBC or ODBC driver from Vertica, it will convert batch SQL into a more efficient COPY statement behind the scenes. Millions of statements isn't really something Vertica is good at, it's very good at bulk though.

  2. If you are not using JDBC or ODBC (vsql, vertica_python module, or other) then you'll need to stream to a COPY FROM STDIN or COPY LOCAL statement. The method for doing this depends on what method you are actually using to connect.

  3. If you are using a sequence, it is very important that the cache be set high. Every uncached sequence pull is a catalog lock. This will surely slow down your inserts enormously. If you used an identity column or auto_number, you can look up the behind-the-scenes associated sequence in the sequences view. Do something like ALTER SEQUENCEsequence_name CACHE 100000;.

Expect gaps in your sequence. This can happen really no matter if you use cache or not, but you're almost guaranteed to get them with cache.

I'll add one more:

  1. If possible, try to make it so the segmentation of the source and target are the same to avoid broadcast of data across the private network.