sql: Fast way to check if data are already in the data base

129 Views Asked by At

I have an MSSQL database Table (named here: TABLE) with four columns (ID, lookup, date, value) and I want to check for a large amount of data whether they are in the database, using python. The data I want to add are here called: to_be_added with columns index, lookup, date, value.

To check whether the data already exist I use the following sql. It returns the index from the to_be_added data which are not yet in the database. I first check which lookup are in the database and then only perform the join on the subset (here called existing).

SELECT to_be_added."index",existing."ID" FROM
(
   (
     select * from dbo.TABLE
     where "lookup" in  (1,2,3,4,5,6,7,...)
   ) existing
right join 
    (
     select * from 
        ( Values
         (1, 1, 1/1/2000, 0.123),(2, 2, 1/2/2000, 0.456),(...,...,...)
        )t1(index,lookup,date,value)
    )to_be_added
 on existing.lookup = to_be_added.lookup
 and existing.date  = to_be_added.date
)
WHERE existing."ID" IS NULL

I do it batchwise as otherwhise the sql command is getting too large to commit and execution time is too long. As I have millions of lines to compare I am looking for a more efficent command as it becomes quite time consuming. Any help appreciated

1

There are 1 best solutions below

2
On

I would do the following:

  1. Load the data from Excel into a table in your DB e.g. table = to_be_added

  2. Run a query like this:

    SELECT a.index
    FROM to_be_added a
    LEFT OUTER JOIN existing e ON 
    a.lookup = e.lookup
    and a.date  = e.date
    WHERE e.lookup IS NULL;
    
  3. Ensure that table "existing" has an index on lookup+date