Optimise query to return various Nth percentiles in MySQL 5.6

48 Views Asked by At

I have a dataset that contains the execution time for various api calls. What I am trying to do is return a table that provides the execution time for various percentiles across different api calls. For context, I am doing this in Domo, which means I am using MySQL 5.6.

Example of data:

Action executionTime
call_x 100
call_x 120
call_x 110
call_y 300
call_y 200
call_y 100

Expected Return:

Action Median 90th Perc 95th Perc
call_x 110 120 120
call_y 200 300 300

I have a solution that works, however it is quite slow over. I'm not sure how much memory Domo throws at an SQL dataflow however, I would like to optimise my query as much as possible. My solution is based on the one described here https://rpbouman.blogspot.com/2008/07/calculating-nth-percentile-in-mysql.html

SELECT
    action,
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY 
    executionTimeMillis SEPARATOR ','), ',', 99/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 99percentile,
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis` ORDER BY 
    executionTimeMillis SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 95percentile,
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY 
    executionTimeMillis SEPARATOR ','), ',', 90/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 90percentile,
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY 
    executionTimeMillis SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS median,
    AVG(executionTimeMillis) AS mean,
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY 
    executionTimeMillis SEPARATOR ','), ',', 25/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 1quartile
FROM MY_TABLE GROUP BY action

I suspect the inefficiency comes from repeatedly creating a list of ordered execution times:

GROUP_CONCAT(executionTimeMillis` ORDER BY executionTimeMillis SEPARATOR ',')

However, as I am using MySQL 5.6 I don't have the option to use a CTE for that.

What else can I do to improve the efficiency of this query? Up to and including scrapping the whole thing and starting again.

0

There are 0 best solutions below