Optimization - is union the best way to go in this case?

52 Views Asked by At

I have a dataset which is updated on a daily basis from an external data source (Example). This data is then combined with some other in-house data (Scale) and is formed to a new table.

Example contains some figures that can be errornous at times. I register for which funds and date the data is incorrect in a separate table (Errors). I wish to take the data from another date that I know is correct and use this as a proxy.

I have made a small piece of code to illustrate my problem. So - the intention is to end up with a complete table containing historical data for all the single investments (that is, I look through the investment funds) in the different companies' portfolios. If any data for a fund has been delivered with errors, the data for the fund should be taken from another date (not necessarily the date before). The scaling should remain the same even though incorrect data is replaced by data from another date.

I have given it a lot of thought and found that the only way I seem to figure out a solution is to UNION two different selections, one with the original data excluding funds with incorrect data and one with the replacement data. But I feel like there should be an easier method to achieve what I need. My original data is both in tabels and views and are huge and a bit slow as it is right now, so I'm interested in finding the most efficient way to create the new table.

Thanks in advance! Line

CREATE TABLE Example(
   MarketDate   datetime NOT NULL,
   FundName VARCHAR(20) NOT NULL,
   SecurityName VARCHAR(48) NOT NULL,
   MarketValue  FLOAT(25),
   Risk  FLOAT(25),
);

CREATE TABLE tblScale(
   MarketDate   datetime NOT NULL,
   Entity VARCHAR (20)   NOT NULL,
   FundName VARCHAR(48) NOT NULL,
   Scale  FLOAT(25),
);

CREATE TABLE Errors(
   MarketDate datetime NOT NULL,
   RiskDate datetime NOT NULL,
   FundName VARCHAR(48) NOT NULL,
);


INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund1', 'Bond1', 2000.00, 5 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund1', 'Bond2', 1500.00, 4);
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund1', 'Bond3', 1300.00, 3 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund1', 'Bond4', 300.00, 109 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund1', 'Bond5', 700.00, 400 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund1', 'Bond6', 600.00, 350 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund1', 'Bond1', 2100.00, 5.1 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund1', 'Bond2', 1400.00, 4.2 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund1', 'Bond3', 1330.00, 3.9 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund1', 'Bond4', 200.00, 2.1 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund1', 'Bond5', 400.00, 2.5 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund1', 'Bond6', 500.00, 2.6 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund2', 'Bond7', 1800.00, 3.5 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund2', 'Bond8', 1900.00, 4.5);
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund2', 'Bond9', 1300.00, 3 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-15', 'Fund2', 'Bond10', 350.00, 2.1 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund2', 'Bond7', 1700.00, 3.4 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund2', 'Bond8', 1810.00, 4.2 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund2', 'Bond9', 1330.00, 3.4 );
INSERT INTO Example (MarketDate,FundName,SecurityName,MarketValue,Risk) VALUES ('2015-06-14', 'Fund2', 'Bond10', 320.00, 2.0 );

INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-15', 'Comp1', 'Fund1', 0.76 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-15', 'Comp2', 'Fund1', 0.10 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-15', 'Comp3', 'Fund1', 0.14 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-15', 'Comp1', 'Fund2', 0.30 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-15', 'Comp2', 'Fund2', 0.35 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-15', 'Comp3', 'Fund2', 0.25 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-14', 'Comp1', 'Fund1', 0.75 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-14', 'Comp2', 'Fund1', 0.10 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-14', 'Comp3', 'Fund1', 0.15 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-14', 'Comp1', 'Fund2', 0.30 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-14', 'Comp2', 'Fund2', 0.35 );
INSERT INTO tblScale (MarketDate,Entity,FundName,Scale) VALUES ('2015-06-14', 'Comp3', 'Fund2', 0.25 );

INSERT INTO Errors (MarketDate,RiskDate,FundName) VALUES ('2015-06-15', '2015-06-14', 'Fund1' );

Declare @Todate as datetime = '2015-06-15';

select data.MarketDate as MarketDate
       ,data.MarketDate as RiskDate
       ,scale.Entity 
       ,data.SecurityName
       ,sum(data.MarketValue)*scale.Scale as MV
from Example data
inner join tblScale scale
on data.MarketDate=scale.MarketDate
and data.FundName=scale.FundName
where data.MarketDate=@Todate
and data.FundName not in (select FundName from Errors where MarketDate=@Todate)
group by data.MarketDate,scale.Entity,data.SecurityName,scale.Scale

union

select  @Todate as MarketDate
       ,data.MarketDate as RiskDate
       ,scale.Entity 
       ,data.SecurityName
       ,sum(data.MarketValue)*scale.Scale as MV
from (select @Todate as Today,* from Example
      where fundname in (select fundname from Errors where marketdate=@todate)
      and marketdate in (select riskdate from Errors where marketdate=@todate)
     ) data
inner join tblScale scale
on data.FundName=scale.FundName
and data.Today=scale.MarketDate
where scale.MarketDate=@Todate
group by data.MarketDate
         ,scale.Entity
         ,data.SecurityName
         ,scale.Scale
         ,scale.MarketDate
1

There are 1 best solutions below

6
On BEST ANSWER

Well first of all, you can use UNION ALL, which will leave out the duplication check. This will be much faster.

Another suggestion is the IN() in your first SELECT. This could cause long runtimes if your subquery returns many rows.

I would suggest to redesign it from this:

select data.MarketDate as MarketDate
       ,data.MarketDate as RiskDate
       ,scale.Entity 
       ,data.SecurityName
       ,sum(data.MarketValue)*scale.Scale as MV
from Example data
INNER join tblScale scale
on data.MarketDate=scale.MarketDate
and data.FundName=scale.FundName
where data.MarketDate=@Todate
and data.FundName not in (select FundName from Errors where MarketDate=@Todate)
group by data.MarketDate,scale.Entity,data.SecurityName,scale.Scale

To this LEFT JOIN including the IS NULL Filter:

select data.MarketDate as MarketDate
       ,data.MarketDate as RiskDate
       ,scale.Entity 
       ,data.SecurityName
       ,sum(data.MarketValue)*scale.Scale as MV
from Example data
inner join tblScale scale
on data.MarketDate=scale.MarketDate
and data.FundName=scale.FundName
LEFT JOIN (select FundName from Errors where MarketDate=@Todate) as fundname
        ON data.FundName = fundname.FundName
where data.MarketDate=@Todate
and fundname.Fundname IS NULL
group by data.MarketDate,scale.Entity,data.SecurityName,scale.Scale