Not getting the correct SUM values in SQL Server 2012 when using a PIVOT

994 Views Asked by At

I am trying to create a query that will pivot over some rows but will SUM some columns and then group them together. I've used the PIVOT function before but I am running into issues when my result set contains similar values.

This is SQL Server 2012.

Sample code:

CREATE TABLE #Foo
(
     Store varchar(50), 
     Employee varchar(50), 
     Sold money, 
     Waste money, 
     Tmsp datetime
)

INSERT INTO #Foo 
VALUES
   ('Harrisburg', 'John', 20.00, 10.00, GETDATE()),
   ('Harrisburg', 'John', 20.00, 10.00, GETDATE()),
   ('Harrisburg', 'Jim', 20.00, 10.00, GETDATE()),
   ('Seattle', 'Jim', 20.00, 10.00, GETDATE()),
   ('Seattle', 'Alex', 20.00, 10.00, GETDATE())

SELECT 
    Store,
    SUM(Sold) TotalSold,
    SUM([John]) WastedByJohn,
    SUM([Jim]) WastedByJim,
    SUM([Alex]) WastedByAlex
FROM 
    #Foo
PIVOT
    (SUM(Waste)
     FOR Employee IN ([John], [Jim], [Alex])
    ) PVT
GROUP BY 
   Store

DROP TABLE #Foo

This yields the following results:

Store      | TotalSold | WastedByJohn | WastedByJim | WastedByAlex
Harrisburg | 20.00     | 20.00        | 10.00       | NULL
Seattle    | 20.00     | NULL         | 10.00       | 10.00

Shouldn't the TotalSold for Harrisburg be 60.00 and the TotalSold for Seattle be 40.00 based on the data in the table?

It gets harder for me to understand, because if I change the data so that the values aren't the same, I get the correct results.

INSERT INTO #Foo 
VALUES
    ('Harrisburg', 'John', 25.00, 10.00, GETDATE()),
    ('Harrisburg', 'John', 30.00, 10.00, GETDATE()),
    ('Harrisburg', 'Jim', 40.00, 10.00, GETDATE()),
    ('Seattle', 'Jim', 50.00, 10.00, GETDATE()),
    ('Seattle', 'Alex', 60.00, 10.00, GETDATE())

This set of data yields the expected result:

Store      | TotalSold | WastedByJohn | WastedByJim | WastedByAlex
Harrisburg | 95.00     | 20.00        | 10.00       | NULL
Seattle    | 110.00    | NULL         | 10.00       | 10.00

I looked around for a bit and couldn't find an answer as to why the PIVOT would be different based on distinct values when it comes to aggregation. I feel like there's something fundamental that I'm missing here, unless I just happened to come across some issue with SQL Server which is unlikely.

Any help would be greatly appreciated.

Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

The following query should give you what you want:

SELECT Store,
       TotalSold,
       [John] AS WastedByJohn,
       [Jim] AS WastedByJim,
       [Alex] AS WastedByAlex
FROM (SELECT Store, Employee, Waste,
             SUM(Sold) OVER (PARTITION BY Store) AS TotalSold
      FROM #Foo) src
PIVOT
    (SUM(Waste)
     FOR Employee IN ([John], [Jim], [Alex])
    ) PVT

To understand why you get unexpected results, try your query without the GROUP BY clause:

SELECT Store, Sold, [John], [Jim], [Alex]
FROM 
    #Foo
PIVOT
    (SUM(Waste)
     FOR Employee IN ([John], [Jim], [Alex])
    ) PVT

Output:

Store       Sold    John    Jim     Alex
Harrisburg  20,00   20,00   10,00   NULL
Seattle     20,00   NULL    10,00   10,00

Now, try the same again with second version of sample data:

Output:

Store       Sold    John    Jim     Alex
Harrisburg  25,00   10,00   NULL    NULL
Harrisburg  30,00   10,00   NULL    NULL
Harrisburg  40,00   NULL    10,00   NULL
Seattle     50,00   NULL    10,00   NULL
Seattle     60,00   NULL    NULL    10,00

By comparing the 2 different result sets you can clearly see that PIVOT takes place for every combination of columns not participating in it, i.e. for every combination of Store, Sold.

In first case there is only Harrisburg,20,00 and Seattle,20,00. That's why you only get two rows in this case. In second case you have a total of 3 + 2 = 5 combinations.

You can now see why GROUP BY works only in the second case.

0
On

You are not getting what does the pivot statement. Let me explain. First of all there are 3 elements: spreading, aggregation and grouping. Spreading is the one you get in columns i.e. Employee IN ([John], [Jim], [Alex]). Aggregation is SUM(Waste). So what is grouping element? The last one is determined by elimination of columns. I.e. every column but aggregation and spreading. In your example it will be Store, Sold, Tps. It will group the data by those 3 columns. But you don't want this. You want to group only by Store. So what to do? I can suggest to use conditional aggregation:

SELECT 
    Store,
    SUM(Sold) TotalSold,
    SUM(CASE WHEN Employee = 'John' THEN Waste ELSE 0 END) WastedByJohn,
    SUM(CASE WHEN Employee = 'Jim' THEN Waste ELSE 0 END) WastedByJim,
    SUM(CASE WHEN Employee = 'Alex' THEN Waste ELSE 0 END) WastedByAlex
FROM #Foo
GROUP BY Store