I have an MSSQL database Table (named here: TABLE) with four columns (ID, lookup, date, value) and I want to check for a large amount of data whether they are in the database, using python. The data I want to add are here called: to_be_added with columns index, lookup, date, value.
To check whether the data already exist I use the following sql. It returns the index from the to_be_added data which are not yet in the database. I first check which lookup are in the database and then only perform the join on the subset (here called existing).
SELECT to_be_added."index",existing."ID" FROM
(
(
select * from dbo.TABLE
where "lookup" in (1,2,3,4,5,6,7,...)
) existing
right join
(
select * from
( Values
(1, 1, 1/1/2000, 0.123),(2, 2, 1/2/2000, 0.456),(...,...,...)
)t1(index,lookup,date,value)
)to_be_added
on existing.lookup = to_be_added.lookup
and existing.date = to_be_added.date
)
WHERE existing."ID" IS NULL
I do it batchwise as otherwhise the sql command is getting too large to commit and execution time is too long. As I have millions of lines to compare I am looking for a more efficent command as it becomes quite time consuming. Any help appreciated
I would do the following:
Load the data from Excel into a table in your DB e.g. table = to_be_added
Run a query like this:
Ensure that table "existing" has an index on lookup+date