Handle duplicate rows during batch insert in MariaDB ColumnStorage

107 Views Asked by At

In my app I want to aggregate some data in staging InnoDB table and then copy that data to Columnstore table using INSER INTO ... SELECT FROM query. After that I truncate staging table and aggreagate new rows for copy. The problem is that in some occasions it could be some rows that already had been copied to columnar table. In this case I want to merge data and update it in columnar table. There is no any keys or constraints in both tables and 'ON DUPLICATE KEY' statement is ignored by MariaDB. What are the best practicies in this case?

UPD: I have two identical tables

CREATE TABLE table_row (
  id varchar(256) not null,
  ...
  changed_at timestamp not nul
) ENGINE=InnoDB

CREATE TABLE table_column (
  id varchar(256) not null,
  ...
  changed_at timestamp not nul
) ENGINE=Columnstore
  1. Aggregating data to table_row
  2. INSERT INTO table_column SELECT * FROM table_row
  3. TRUNCATE table_row
  4. repeat

On the second step I need to handle duplicated ID's and update changed_at column instead of inserting all rows into table_column table.

2

There are 2 best solutions below

0
Federico Razzoli On

As you said, you can't use UNIQUE keys with ColumnStore, so ON DUPLICATE KEY can't be used neither.

Can't you keep an InnoDB table with the ID's of the rows that should already exist in ColumnStore? Then you could do something like this:

  1. Aggregating data to table_row
  2. INSERT INTO processed_id SELECT id FROM table_row
  3. INSERT INTO table_column SELECT t.* FROM table_row t LEFT JOIN processed_id p ON t.id = p.id AND p.id IS NULL
  4. TRUNCATE table_row
  5. repeat
0
Sylvain Arbaudie On

You could do a variant like this :

CREATE TABLE table_row ( id varchar(256) not null unique, ... changed_at timestamp not nul ) ENGINE=InnoDB

  1. insert into table_row select id from table_column
  2. insert aggregated datas into table_row with unique constraint violation management
  3. insert into table_column select * from table_row ( or use cpimport which is way faster here )