Is there a faster/better way to run this TSQL WHILE loop?

98 Views Asked by At

I'll start by just giving a brief explanation of my base data. I have a dataset where each row represents one work item. After this item is allocated to an employee, the employee will work the item. During this process, the item can hit a milestone called 'folder'. Each one of the work items is graded at the start of the process (Grade A, Grade B, etc), with better grades indicating a higher likelihood that the item makes it from allocation to folder.

What I am doing right now is running a test for a new way of allocating these work items. The items are routed to either someone in the test group or control group. What my code is doing is looping through the dataset to randomly select a percentage of the rows based on each item Grade + Group combination. The output is set up to show if the test or control group is expected to produce more folders based on the item grade and overall. It is also set up to show how many times the test/control group has more folders for each time it iterates through this loop.

My code works. It does what I need it to do. I just can't help but wonder if there is a better way to do this. Full disclosure, I'm not a DBA. I'm not a SQL expert. I'm just an analyst trying to figure out if there is a way I can optimize my code or run it differently. My code went from being very long and explicit to the point where I have it now using nested while loops. The performance didn't get any better but I cut it down drastically. If I set the code to 10,000 iterations, which is what I would prefer the minimum be, it takes over an hour to run through and my dataset is only going to grow so it will take longer and longer. If anyone has any tips or suggestions on a different way to structure this so it runs faster (assuming its possible), which would allow me to run the code more often (I work with impatient people lol), it would be greatly apprecaited.

This is a sample of my data.

This is my code (The #leads temp table is what is in the linked dataset):

--Creating a new table to put the random sample of allocations into
DROP TABLE IF EXISTS #FINAL;
CREATE TABLE #FINAL (LeadGrade varchar(20),
                     [Group] varchar(20),
                     Allocations int,
                     Folders int,
                     Pass int);

--Creating a table of test condition group names for the nested loops
DROP TABLE IF EXISTS #group;
CREATE TABLE #group (groupnum int,
                     testgroup varchar(8));
INSERT INTO #group
VALUES (1, 'Test'),
       (2, 'Control');

--Declaring all the necessary variables to run through the random sampling
DECLARE @GROUPNUM int = 1;
DECLARE @GRADENUM int = 1;
DECLARE @COUNTER int = 1;
DECLARE @ITERATIONS int = 1000;

--Top layer of the loop
--Checks how many times the inner loops have run. As long as it is less than the specified number of iterations, repeats the inner loops
WHILE @COUNTER <= @ITERATIONS
BEGIN

    --Creates a temp table of the leads with their milestones, adding on a random number value to each row
    --This random number will be different with each iteration due to the table being dropped and repopulated
    DROP TABLE IF EXISTS #allleadsrandom;
    SELECT l.*,
           CHECKSUM(NEWID()) AS RandomNumber
    INTO #allleadsrandom
    FROM #leads l
    ORDER BY CHECKSUM(NEWID()) DESC;

    --2nd layer loop. This loop will change the LeadGrade on each pass thru from A to B to C to D
    WHILE @GRADENUM <= 4
    BEGIN

        --3rd layer loop. This loop pulls data from the #allleadsrandom temp table starting
        --It uses the lead grade based on the higher loop number and the Group based on if it is the 1st (test) or 2nd (control) pass thru of this loop
        WHILE @GROUPNUM <= 2
        BEGIN

            --Pulling the top # of rows from #allrandomleads, based on the pct of leads in the data that are comprised of the current lead grade 
            DROP TABLE IF EXISTS #focus;
            SELECT TOP (SELECT PctOfLeads * 10 FROM #leadgradepct WHERE GradeNum = @GRADENUM)
                   *
            INTO #focus
            FROM #allleadsrandom
            WHERE LeadGrade = (SELECT LeadGrade FROM #leadgradepct WHERE gradenum = @GRADENUM)
              AND [group] = (SELECT testgroup FROM #group WHERE groupnum = @GROUPNUM);

            --Takes data from previous temp table and sums the number of allocations and folders
            DROP TABLE IF EXISTS #loader;
            SELECT LeadGrade,
                   [Group],
                   SUM(allocation) AS Allocations,
                   SUM(folder) AS Folders
            INTO #loader
            FROM #focus
            GROUP BY LeadGrade,
                     [Group];

            --Uses the loader temp table and inserts new row into the final temp table for analysis
            INSERT INTO #FINAL
            SELECT l.LeadGrade,
                   l.[Group],
                   l.Allocations,
                   l.Folders,
                   @COUNTER AS Pass
            FROM #loader l;

            --Reset the loop for the group condition, changing from test to control group. If control group was run, end the loop
            SET @GROUPNUM = @GROUPNUM + 1;
        END;

        --Reset the Group from control to test. Reset the loop for lead grade condition, changing grade from A to B to C to D. If Grade in current pass was D, end the loop
        SET @GROUPNUM = 1;
        SET @GRADENUM = @GRADENUM + 1;
    END;

    --Reset the entire loop. Set Grade back to A. Set Group back to test. Add one to the counter. If maximum number of iterations reached, end the loop.
    SET @GROUPNUM = 1;
    SET @GRADENUM = 1;
    SET @COUNTER = @COUNTER + 1;
END;

---------------------------------------------------------------------------------------------------------------------------------------

--Takes all the iterations of the nested loops and creates one final table of information
--Gives a summary of allocations and average folder count broken out by lead grade and test group
DROP TABLE IF EXISTS #results;
SELECT LeadGrade,
       [Group],
       SUM(Allocations) / @ITERATIONS AS AvgAllocations,
       SUM(Folders) / @ITERATIONS AS AvgFolders,
       MAX(Pass) AS #OfIterations
INTO #results
FROM #FINAL
GROUP BY LeadGrade,
         [Group]
ORDER BY LeadGrade ASC,
         [Group] DESC;

SELECT *
FROM #results
ORDER BY LeadGrade ASC,
         [Group] DESC;
SELECT [Group],
       SUM([AvgFolders]) AS AvgExpectedFolders
FROM #results
GROUP BY [Group]
ORDER BY [Group] DESC;

-------------------------------------------------------------------------------------------------------------------------------------------

--Another loop to go through the #FINAL temp table and output
--a comparison of how many times the test group had more folders vs how many times the control group had more folders
DECLARE @TESTTALLY AS int = 0;
DECLARE @CONTROLTALLY AS int = 0;
DECLARE @PASS AS int = 1;

WHILE @PASS <= @ITERATIONS
BEGIN

    --Get the total folders for the test group and the total folders for the Control group
    DECLARE @TESTCOUNT AS int = (SELECT SUM(Folders) AS Folders
                                 FROM #FINAL
                                 WHERE Pass = @PASS
                                   AND [Group] = 'Test'
                                 GROUP BY [Group]);
    DECLARE @CONTROLCOUNT AS int = (SELECT SUM(Folders) AS Folders
                                    FROM #FINAL
                                    WHERE Pass = @PASS
                                      AND [Group] = 'Control'
                                    GROUP BY [Group]);

    --Compare the folders in the test to control group for a unique iteration, add a tally to the group with more folders
    IF @TESTCOUNT > @CONTROLCOUNT
        SET @TESTTALLY = @TESTTALLY + 1;
    ELSE
        SET @CONTROLTALLY = @CONTROLTALLY + 1;

    SET @PASS = @PASS + 1;
END;

SELECT @TESTTALLY AS [Test Group More Folders],
       @CONTROLTALLY AS [Control Group More Folders];

0

There are 0 best solutions below