Query for calculating percentile based on average

95 Views Asked by At

I have a group of athletes and I would like to calculate where their scores land from a percentile perspective. Doing so based on their best score doesn't make sense, but doing so based on their average is closer to the mark.

So let's assume this data:

athleteId points
1 100
1 200
1 300
1 400
1 500
2 101
2 202
2 303
2 404
2 505
3 10
3 20
3 30
3 40
3 50

Using this data, I can execute this simple query to calculate their average points:

SELECT
    t.athleteId,
    AVG(t.points) AS average
FROM table t
GROUP BY t.athleteId

And I will get a result of:

athleteId average
1 300
2 303
3 30

Using those averages, I now want to calculate the athlete's percentile based on their average score against the original table. To make things a bit more complicated (since this query is part of a bigger picture and I can't lose sight of that), I would also like to know the athlete's best score, the total number of scores, and how their best score ranked. Therefore, I would expect the following results:

athleteId average bestScore totalScores rank percentile
1 300 500 15 2 0.60
2 303 505 15 1 0.67
3 30 50 15 11 0.13

I need the SQL Server query that can produce that result.

1

There are 1 best solutions below

1
Brett On

I can't get your exact results. The data you set out is quite "santiary", in that each athlete's average score is also one of their actual scores, so the "percentile of average score" neatly resolves itself when we eyeball the data.

This solution implements a single query with CTE's - if you're writing a stored proc then you might want to use temp tables, and a variable for the total row count.

Anyway, here we go.

I'll declare a table variable to hold the data in this example.

declare @table table (athleteId int, points int)

insert into @table values 
  (1, 100), (1, 200), (1, 300), (1, 400), (1, 500)
, (2, 101), (2, 202), (2, 303), (2, 404), (2, 505) 
, (3, 10), (3, 20), (3, 30) ,(3, 40), (3, 50)

Now express the various CTE's.

The first one adds the rank of each score

; with ranks as
(
    select 
        athleteId
        ,points
        ,RANK() over (order by points desc) pointRank
    from @table
)

.. this one throws the total number of scores in ..

,ranksAndRowCount as
(
    select *, (select COUNT(*) from ranks) totalScores from ranks
)

.. this one calculates the averages ..

Note the cast to float, which will force a non-integer calculation later when we determine the percentile.

,averages as 
(
    SELECT
        t.athleteId,
        AVG(t.points) AS average,
        MAX(t.points) as bestScore,
        min(t.pointRank) as bestScoreRank,
        cast(min(t.totalScores) as float) as totalScores
    FROM ranksAndRowCount t
    GROUP BY t.athleteId
)

.. so far, so good.

Then I had difficulty with the percentile. I have taken the athletes calculated average and added it to the total list of scores, performed a rank, pulled out the rank for the average score, and then divided that rank by the total row count + 1.

select   athleteId
        ,average
        ,bestScore
        ,bestScoreRank
        ,totalScores,
         (
            select   avg(pointsInclAverageRanked.pcrank) 
              from   (
                        select   points



                                ,pcrank = rank() over (order by points) 
                          from   (  
                                    select   points 
                                     -- need to put the average back into the original scores
                                      from   @table union select a.average
                                  ) pointsInclAverage
                     ) pointsInclAverageRanked 
              where  pointsInclAverageRanked.points = a.average
         )
          -- there are more total scores, because we added in the average
          / (1.0+totalScores) averageScorePercentile
from averages a

It's not quite the same result as you were looking for, but it gets close.

athleteId average bestScore bestScoreRank totalScores averageScorePercentile
1 300 500 2 15 0.625
2 303 505 1 15 0.6875
3 30 50 11 15 0.1875

Hope this helps you along your way.