I have a dataset that contains the execution time for various api calls. What I am trying to do is return a table that provides the execution time for various percentiles across different api calls. For context, I am doing this in Domo, which means I am using MySQL 5.6.
Example of data:
| Action | executionTime |
|---|---|
| call_x | 100 |
| call_x | 120 |
| call_x | 110 |
| call_y | 300 |
| call_y | 200 |
| call_y | 100 |
Expected Return:
| Action | Median | 90th Perc | 95th Perc |
|---|---|---|---|
| call_x | 110 | 120 | 120 |
| call_y | 200 | 300 | 300 |
I have a solution that works, however it is quite slow over. I'm not sure how much memory Domo throws at an SQL dataflow however, I would like to optimise my query as much as possible. My solution is based on the one described here https://rpbouman.blogspot.com/2008/07/calculating-nth-percentile-in-mysql.html
SELECT
action,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY
executionTimeMillis SEPARATOR ','), ',', 99/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 99percentile,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis` ORDER BY
executionTimeMillis SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 95percentile,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY
executionTimeMillis SEPARATOR ','), ',', 90/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 90percentile,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY
executionTimeMillis SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS median,
AVG(executionTimeMillis) AS mean,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(executionTimeMillis ORDER BY
executionTimeMillis SEPARATOR ','), ',', 25/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 1quartile
FROM MY_TABLE GROUP BY action
I suspect the inefficiency comes from repeatedly creating a list of ordered execution times:
GROUP_CONCAT(executionTimeMillis` ORDER BY executionTimeMillis SEPARATOR ',')
However, as I am using MySQL 5.6 I don't have the option to use a CTE for that.
What else can I do to improve the efficiency of this query? Up to and including scrapping the whole thing and starting again.