I'm not a expert on SQL and I'm working on a huge SQL code and unfortunately it has a CURSOR which handles another two nested CURSORS within it (totally three cursors inside a stored procedure), which handles millions of data to be DELETE,UPDATE and INSERT.
This takes a whole lot of time (more than 13 hrs and eventually gets an error 'cursor names datasets already exists') because of row by row execution and I'm trying to modify this in to SET based approach and I'm stuck how to do the INSERT,DELETE and UPDATE while converting into SET based approach.Or in other words I couldn't do all the INSERT,DELETE and UPDATE by replacing them in JOINS.
I know this is a large code and might not be right attempt to post here but I already to put this in a simple manner in another post and people doesn't get the idea of my question so I'm putting here the actual code.
This is the code I'm working on :
ALTER PROCEDURE [dbo].[proc_Upload] WITH RECOMPILE
as
set NoCount on
DECLARE
@StartTime datetime,
@EndTime datetime,
@DataID uniqueidentifier,
@CollectionDate datetime,
@Status int,
@PeriodID int,
@EndDate datetime,
@GroupID uniqueidentifier,
@ProjectID INT,
@FAID uniqueidentifier,
@UploadID int ,
@Createdate datetime
declare @projects table(
[ProjectID] [INT],
[Title] [varchar](255) ,
[currency] [int] ,
[Cost_Until_Today] [float] ,
[StartDate] [datetime] ,
[EndDate] [datetime] ,
[MisID] [uniqueidentifier] ,
[SystemStatus] [int] ,
[FacilitatingAgency] [uniqueidentifier] ,
[SyncMode] [int] )
INSERT @projects
SELECT ID, PROJECTNAME, PROJECTCURRENCY_ID, Cost_Until_Today, PROJECTESTABLISHEDDATE, EndDate, MisID, 4, FacilitatingAgency, SyncMode
FROM [dbo].[T_PROJECTS] /*thsi is the source table where every projectIDs need to be processed*/
IF exists ( select *
from dbo.sysobjects
where id = object_id(N'[dbo].[datasets]') and objectproperty(id, N'IsTable') = 1 )
DROP Table [dbo].datasets
SELECT mgd.Gd_ID, mg.Grp_ID, mgd.Gd_CollectionDate, mgd.Gd_IsDeleted, mgd.Gd_CreateDate, mg.Grp_Project, mg.Grp_IsDeleted , mg.Grp_Legacy_ID, p.LegacyProjectId
INTO datasets
FROM Savix_Service_Group..Group_Data mgd
INNER JOIN Savix_Service_Group..Group_Cycle mgc ON mgc.Gc_ID = mgd.Gd_CycleID
INNER JOIN Savix_Service_Group..Groups mg ON mg.Grp_ID = mgc.Gc_GroupID
inner join SG_Dynamic_Forms..v_projects p ON p.ProjectID = mg.Grp_Project
DECLARE projects1 CURSOR LOCAL FOR Select distinct ProjectID, isnull([FacilitatingAgency], '00000000-0000-0000-0000-000000000000') from @projects P WHERE P.SystemStatus = 4 AND P.SyncMode = 1
/*First cursor - fetch the cursor from ProjectaTable*/
OPEN projects1
FETCH NEXT FROM projects1 INTO @ProjectID, @FAID
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
BEGIN TRAN
DELETE FROM T_PROJECTGROUPSDATA
WHERE T_PROJECTGROUPSDATA.UPLOAD_ID IN (SELECT ID FROM T_UPLOADS WHERE project_savix_ID = @ProjectID AND UPLOADFILENAME = 'Automatic upload from web MIS')
DECLARE datasets CURSOR LOCAL FAST_FORWARD FOR SELECT Gd_ID, Grp_ID, Gd_CollectionDate, Gd_IsDeleted, Gd_CreateDate
FROM datasets
WHERE LegacyProjectId = @ProjectID AND Grp_IsDeleted = 0 AND Gd_IsDeleted != 1
/*Second cursor - this will get the 'collectionDate'field from datasetsTable for every project fetched in above cursor and also get @dataID which is used to insert value in to other table-T_PROJECTGROUPSDATA*/
OPEN datasets
FETCH NEXT FROM datasets INTO @DataID, @GroupID, @CollectionDate, @Status, @Createdate
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE period CURSOR LOCAL FAST_FORWARD FOR SELECT ID, dbo.fn_GetEndOfPeriod(ID) FROM T_PERIODS
/* dbo.fn_GetEndOfPeriod(ID) - this function will give the end of the date of that specifc quarter for any given date*/
WHERE DATEDIFF(dd,@CollectionDate,dbo.fn_GetEndOfPeriod(ID)) >= 0
ORDER BY [YEAR],[Quarter]
/*Third Cursor - this will process the records from another table called period with above fetched @collectionDate*/
OPEN period
FETCH NEXT FROM period INTO @PeriodID, @EndDate
WHILE @@FETCH_STATUS = 0
BEGIN
IF EXISTS (SELECT * FROM Savix_Service_Group..Group_Data mgd
INNER JOIN Savix_Service_Group..Group_Cycle mgc ON mgc.Gc_ID = mgd.Gd_CycleID
WHERE mgc.Gc_GroupID = @GroupID
AND DATEDIFF(dd,mgd.Gd_CollectionDate,@EndDate) >= 0
AND (mgd.Gd_CollectionDate > @CollectionDate )
AND mgd.Gd_IsDeleted != 1)
BEGIN
BREAK
END
IF EXISTS (SELECT ID FROM T_UPLOADS u
WHERE u.project_savix_ID = @ProjectID AND u.PERIOD_ID = @PeriodID AND u.STATUS = 3 AND UPLOADFILENAME != 'Automatic upload from web MIS')
BEGIN
FETCH NEXT FROM period INTO @PeriodID, @EndDate
CONTINUE
END
SET @UploadID = (SELECT ID FROM T_UPLOADS u WHERE u.project_savix_ID = @ProjectID AND u.PERIOD_ID = @PeriodID AND u.STATUS = 3)
/*If T_uploads doesn't have appropirate period ID from cursor fetch then create a new entry in T_uploads with current projectID*/
IF @UploadID IS NULL
BEGIN
declare @Project_ID_Legacy int = ISNULL((select distinct PROJECT_ID from T_UPLOADS where project_savix_ID = @ProjectID),@ProjectID)
INSERT INTO T_UPLOADSFIRSTSTEP
(PROJECT_ID
,UPLOADDATE
,UPLOADFILENAME
,UPLOADUSER_ID
,CURRENTSTEP
,STATUS
,Project_ID_MIS)
SELECT @Project_ID_Legacy , --mm 06/12
GETDATE(),
'Automatic upload from web MIS',
2,
2,
0,
@ProjectID
INSERT INTO T_UPLOADS
(ID, periodID, projectID,UPLOADDATE,UPLOADFILENAME,UPLOADUSER_ID )
SELECT uf.ID,
@PeriodID,
uf.PROJECT_ID,
uf.UPLOADDATE,
uf.UPLOADFILENAME,
uf.UPLOADUSER_ID
FROM T_UPLOADSFIRSTSTEP uf
INNER JOIN @projects mp ON uf.Project_ID_MIS = mp.ProjectID
WHERE uf.Project_ID_MIS = @ProjectID AND uf.STATUS = 0
AND NOT EXISTS (SELECT * FROM T_UPLOADS u WHERE u.PROJECT_ID = uf.PROJECT_ID AND u.PERIOD_ID = @PeriodID AND u.STATUS = 3)
DELETE FROM T_UPLOADSFIRSTSTEP WHERE STATUS = 0 AND PROJECT_ID = @Project_ID_Legacy
SET @UploadID = (SELECT ID FROM T_UPLOADS u WHERE u.project_savix_ID = @ProjectID AND u.PERIOD_ID = @PeriodID AND u.STATUS = 3)
END ELSE
UPDATE T_UPLOADS SET
TOTALEXPENDITURES = CASE WHEN DATEDIFF(d,mp.StartDate,mp.EndDate) != 0
THEN mp.Cost_Until_Today*DATEDIFF(d,mp.StartDate,dbo.fn_GetEndOfPeriod(@PeriodID))/DATEDIFF(d,mp.StartDate,mp.EndDate)/dbo.fn_RateAtDate(mp.Currency,dbo.fn_GetEndOfPeriod(@PeriodID))
ELSE 0 END,
TotalExpendituresNative = CASE WHEN DATEDIFF(d,mp.StartDate,mp.EndDate) != 0
THEN mp.Cost_Until_Today*DATEDIFF(d,mp.StartDate,dbo.fn_GetEndOfPeriod(@PeriodID))/DATEDIFF(d,mp.StartDate,mp.EndDate)
ELSE 0 END
FROM @projects mp
WHERE T_UPLOADS.ID = @UploadID AND mp.ProjectID = @ProjectID
INSERT INTO T_PROJECTGROUPSDATA
(uploadID, fieldA,fieldB,......fieldN )
SELECT @UploadID,p.fieldA,mg.fieldB,......mgc.fieldN
FROM @projects mp
inner join SG_Dynamic_Forms..v_projects p ON p.LegacyProjectId = mp.projectID
inner join Savix_Service_Group..Groups mg ON mg.Grp_Project = p.ProjectID
INNER JOIN Savix_Service_Group..Group_Cycle mgc ON mgc.Gc_GroupID = mg.Grp_ID
INNER JOIN Savix_Service_Group..Group_Data mgd ON mgd.Gd_CycleID = mgc.Gc_ID
LEFT JOIN Savix_Service_Trainers..Trainers me ON me.Tr_ID = mgc.Gc_MonitoredBy
LEFT JOIN Savix_Service_Dictionaries..Dictionary mgt ON mgt.Dny_ID = me.Tr_Type
--left join v1_Report_UDF_Data_UploadToSavix udf on udf.DataID = mgd.Gd_ID
WHERE mgd.Gd_ID = @DataID
FETCH NEXT FROM period INTO @PeriodID, @EndDate
END
CLOSE period
DEALLOCATE period
FETCH NEXT FROM datasets INTO @DataID, @GroupID, @CollectionDate, @Status, @Createdate
END
CLOSE datasets
DEALLOCATE datasets
COMMIT
END TRY
BEGIN CATCH
SELECT ERROR_NUMBER(), ERROR_MESSAGE(),@PeriodID, @ProjectID, @UploadID,@DataID
IF CURSOR_STATUS('global' , 'period') >= 0
BEGIN
CLOSE period
DEALLOCATE uploadID
END
IF CURSOR_STATUS('global' , 'datasets') >= 0
BEGIN
CLOSE datasets
DEALLOCATE datasets
END
IF @@TRANCOUNT > 0
ROLLBACK
INSERT INTO error_catch_UploadtoSavix
SELECT cast(ERROR_NUMBER() as nvarchar), ERROR_MESSAGE(),@PeriodID, @ProjectID, @UploadID,@DataID, getdate()
END CATCH
FETCH NEXT FROM projects1 INTO @ProjectID, @FAID
END
CLOSE projects1
DEALLOCATE projects1
SELECT 1 as success
Here is what I've tried to do and stuck in middle and incomplete:
select s.ID,u.*
from T_PROJECTS_TEMP pt
INNER JOIN
(SELECT mgd.Gd_ID, mg.Grp_ID, mgd.Gd_CollectionDate, mgd.Gd_IsDeleted, mgd.Gd_CreateDate, mg.Grp_Project, mg.Grp_IsDeleted , mg.Grp_Legacy_ID, p.LegacyProjectId
FROM Savix_Service_Group..Group_Data mgd
INNER JOIN Savix_Service_Group..Group_Cycle mgc ON mgc.Gc_ID = mgd.Gd_CycleID
INNER JOIN Savix_Service_Group..Groups mg ON mg.Grp_ID = mgc.Gc_GroupID
inner join SG_Dynamic_Forms..v_projects p ON p.ProjectID = mg.Grp_Project
WHERE LegacyProjectId = 5047 AND Grp_IsDeleted = 0 AND Gd_IsDeleted != 1 ) dataset on pt.ID = dataset.LegacyProjectId
INNER JOIN T_PERIODS s ON DATEDIFF(DAY,dataset.Gd_CollectionDate,dbo.fn_GetEndOfPeriod(s.ID)) >= 0
LEFT JOIN T_UPLOADS u ON u.project_savix_ID = pt.ID AND u.PERIOD_ID = s.ID AND u.STATUS = 3 --AND u.UPLOADFILENAME != 'Automatic upload from web MIS'
WHERE NOT EXISTS (SELECT * FROM Savix_Service_Group..Group_Data mgd
INNER JOIN Savix_Service_Group..Group_Cycle mgc ON mgc.Gc_ID = mgd.Gd_CycleID
WHERE mgc.Gc_GroupID = Grp_ID
AND DATEDIFF(dd,mgd.Gd_CollectionDate,(select dbo.fn_GetEndOfPeriod(s.ID))) >= 0
AND (mgd.Gd_CollectionDate > Gd_CollectionDate) AND mgd.Gd_IsDeleted != 1 )
AND NOT EXISTS (SELECT ID FROM T_UPLOADS u
WHERE u.project_savix_ID = pt.ID AND u.PERIOD_ID = s.ID AND u.STATUS = 3 AND UPLOADFILENAME != 'Automatic upload from web MIS')
ORDER BY s.ID
Sample Data
Tables and data for T_Projects and datasets - http://www.sqlfiddle.com/#!18/00205/2
Tables and data for T_Periods - http://www.sqlfiddle.com/#!18/3b6b1/2
Tables and data for T_uploads (before the cursor running)- http://www.sqlfiddle.com/#!18/7d12f3/1
Other tables data 1 - http://www.sqlfiddle.com/#!18/6e499
other tables data 2 - http://www.sqlfiddle.com/#!18/30e3b/3
function fn_getEndOFperiod - http://www.sqlfiddle.com/#!18/54d66
PLease note - there will be more than one row in any of the above tables, however the example I had given will have just one record.
Excepted Data
T_uploads table - http://sqlfiddle.com/#!18/027cc/1
T_ProjectsGroupData will have total of 15 records aswell i.e for every T_uploads.ID (which is relation to T_ProjectsGroupData .UploadID field) there will be a entry in T_ProjectsGroupData
I except this CURSOR based code to be converted in to a simple SET based approach, any help and suggestion to improve this code will be much much appreciable.
To explain how I came up with this, here is a modified version of the code below. I cannot guarantee I got everything, I had to make some assumptions. There was a lacking of sample data and all the savix tables are missing. This is the best answer I can come up with given the information and limiting my time to less than 4 hours invested. I could do much more but you would have to give me more than internet fame.
It is difficult to work and test cursors, so I replaced them ( projects, projects1, datasets and period ) with temp tables containing an identity and looped through them.
Projects contains all ID's ( projectID ) from T_Projects, the only fields that are used from this table are the ID, projectcurrency, projectestablisheddate ( startdate ), enddate and syncmode ( must be = 1, we will see that later ). We can ignore systemstatus as it is a constant.
The table dastasets is dropped and loaded here. We only care about the gd_id from the Savix_Service_Group..Group_Data table but still need to join to the other tables in case that they filter out values we don't want. This logic is used again down below when inserting records.
Here we are inserting into projects1 a distinct value of project id's. Without more data, my assumption is that the ID is distinct in the T_projects table and therefore this step is somewhat unnecessary aside from filtering out by SyncMode = 1. Remember that SystemStatus is a constant and we populated it with the value '4', this criteria could be removed with no effect.
Here we insert into #datasets filtering by T_Project.ID AND Savix_Service_Group..Groups.Grp_IsDeleted = 0 AND Savix_Service_Group..Group_Data.Gd_IsDeleted != 1
This is probably the worst use of a cursor here. We load all the periods and cycle through them. In the end we only care about periods that are in the T_UPLOADS table.
Here we skip any periods where UPLOADFILENAME != 'Automatic upload from web MIS'
With that in mind we only care where status = 3 and UPLOADFILENAME = 'Automatic upload from web MIS'
Why they use a second table ( T_UPLOADFIRSTSTEP ) to generate a record where they only use the id and then delete the record using that ID in the creation of the T_Uploads record is beyond my understanding and appears to me to be terrible coding.
The insert into T_UPLOADS will always fail as there are fields that can not be null that are not in the insert list and do not have a default value assigned to them.
So here is the real logic that is done inside those 3 cursors. Understanding that we don't care about periods, the logic loops through them all. We only care that we have to filter certain criteria ( status = 3 and UPLOADFILENAME = 'Automatic upload from web MIS' ) and that they have a matching T_Project.ID table with the criteria T_Projects.SyncMode = 1
The insert into T_PROJECTGROUPDATA has duplicated the logic used to create the @dataid variable derived from #datasets. Which is derived from the datasets table we dropped and created above.
There you have it. Condensing nearly 300 lines and 3 cursors down to 30 lines and no cursors.