You are viewing an older version of this section. View current production version.
PERCENTILE_CONT and MEDIAN
In SingleStore DB, percentile functions are available as window functions and aggregate functions. There are two percentile functions: PERCENTILE_CONT
and PERCENTILE_DISC.
PERCENTILE_CONT
is an inverse distribution function. It assumes a continuous distribution between values of the expression in the sort specification. Then, it interpolates the value of that expression at the given percentile, performing a linear interpolation.
For an overview of window functions in SingleStore DB, see Window Functions Guide.
Syntax
PERCENTILE_CONT(percentile)
WITHIN GROUP (ORDER BY (col | expr))
[OVER (
[PARTITION BY (col | expr), ...]
[frame_clause]
)]
or equivalently
PERCENTILE_CONT(percentile)
OVER (
ORDER BY (col | expr)
[PARTITION BY {col | expr}, ...]
[frame_clause]
)
Also, MEDIAN
can be used as a regular aggregate function, as follows:
MEDIAN (col | expr)
Arguments
percentile
A numeric value between 0 and 1.
col
A column of a numeric data type.
expr
An expression that evaluates to a numeric data type.
Remarks
Both the PERCENTILE_CONT
and MEDIAN
functions can be used as standard aggregate functions.
The OVER()
clause defines how to partition the input set. If no PARTITION BY
clause is specified, the entire input set is treated as a single partition.
Only one column or expression is allowed in the ORDER BY
clause, and this column is the one for which the percentile is calculated.
Examples
Example 1
The following example demonstrates the use of PERCENTILE_CONT
as an aggregate function.
CREATE TABLE sets (a INT, b INT, c INT);
INSERT INTO sets VALUES (1,1,2),(2,2,3),(3,1,2),(4,2,4),(5,1,1),(2,1,1),(null,null,3);
SELECT * FROM sets;
****
+------+------+------+
| a | b | c |
+------+------+------+
| NULL | NULL | 3 |
| 1 | 1 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 1 | 2 |
| 4 | 2 | 4 |
| 5 | 1 | 1 |
+------+------+------+
SELECT PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY a)
FROM sets;
****
+------------------------------------------------+
| PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY a) |
+------------------------------------------------+
| 2.5000 |
+------------------------------------------------+
SELECT c, PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY a) AS "PERCENTILE"
FROM sets
GROUP BY c ORDER BY c;
****
+------+------------+
| c | PERCENTILE |
+------+------------+
| 1 | 3.5000 |
| 2 | 2.0000 |
| 3 | 2.0000 |
| 4 | 4.0000 |
+------+------------+
SELECT PERCENTILE_CONT(0.25)
WITHIN GROUP (ORDER BY a) AS "PERCENTILE"
FROM sets;
****
+------------+
| PERCENTILE |
+------------+
| 2.0000 |
+------------+
The percentile of a data set may change depending on how it is ordered.
SELECT PERCENTILE_CONT(0.25)
WITHIN GROUP (ORDER BY a DESC) AS "PERCENTILE"
FROM sets;
****
+------------+
| PERCENTILE |
+------------+
| 3.7500 |
+------------+
Multiple PERCENTILE_CONT
functions can be used in a single query, and each function must have its own WITHIN GROUP
clause.
SELECT PERCENTILE_CONT(0.25)
WITHIN GROUP (ORDER BY a) AS "PERCENTILE a",
PERCENTILE_CONT(0.25)
WITHIN GROUP (ORDER BY b) AS "PERCENTILE b"
FROM sets;
****
+--------------+--------------+
| PERCENTILE a | PERCENTILE b |
+--------------+--------------+
| 2.0000 | 1.0000 |
+--------------+--------------+
Example 2
The following example creates a table containing SAT scores. The resulting window frame shows the hypothetical 90th percentile SAT score given the existing scores.
CREATE TABLE SAT (reading_writing int, math int);
INSERT INTO SAT VALUES (800,800),(650,300),(700,500);
SELECT reading_writing, math, reading_writing + math AS total,
PERCENTILE_CONT(.9)
WITHIN GROUP (ORDER BY reading_writing + math)
OVER () AS `90th Percentile`
FROM SAT;
****
+-----------------+------+-------+-----------------+
| reading_writing | math | total | 90th Percentile |
+-----------------+------+-------+-----------------+
| 650 | 300 | 950 | 1520.0000 |
| 700 | 500 | 1200 | 1520.0000 |
| 800 | 800 | 1600 | 1520.0000 |
+-----------------+------+-------+-----------------+
Median
Compute the median using percentile = 0.5.
PERCENTILE_CONT( 0.5 ) ...
Example:
CREATE TABLE t (a int);
INSERT INTO t VALUES (1),(2),(3),(166),(1024);
SELECT a, AVG(a) OVER () AS mean,
PERCENTILE_CONT(.5)
WITHIN GROUP (ORDER BY a)
OVER () AS median
FROM t;
****
+------+----------+--------+
| a | mean | median |
+------+----------+--------+
| 1 | 239.2000 | 3.0000 |
| 2 | 239.2000 | 3.0000 |
| 3 | 239.2000 | 3.0000 |
| 166 | 239.2000 | 3.0000 |
| 1024 | 239.2000 | 3.0000 |
+------+----------+--------+
Median and Percentiles with Regular Aggregates and Grouping
Suppose you have some data with columns you want to group by and another column you want to aggregate using several aggregate functions, including standard ones like AVG, MIN, MAX etc., as well as percentiles and median. In other words, you want to use percentile functions and median just like regular aggregates.
For example, suppose you have the following data:
CREATE TABLE t (id int, category varchar(10), year int, value int);
INSERT t VALUES
(1, "ABC", 2019, 100),
(2, "ABC", 2019, 150),
(3, "ABC", 2019, 200),
(4, "ABC", 2020, 100),
(5, "ABC", 2020, 175),
(6, "ABC", 2020, 250),
(7, "XYZ", 2020, 200),
(8, "XYZ", 2020, 275),
(9, "XYZ", 2020, 350),
(10, "XYZ", 2020, 450);
Suppose you want to compute the following aggregates of value
grouped by
category
and year
for this data: SUM
, AVG
, MEDIAN
, 10th percentile, and 90th percentile. The query you specify is:
SELECT category, year, SUM(value), AVG(value), MEDIAN(value),
PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY value),
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY value)
FROM t
GROUP BY category, year
ORDER BY category, year;
****
+----------+------+------------+------------+---------------+----------------------------------------------------+----------------------------------------------------+
| category | year | SUM(value) | AVG(value) | MEDIAN(value) | PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY value) | PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY value) |
+----------+------+------------+------------+---------------+----------------------------------------------------+----------------------------------------------------+
| ABC | 2019 | 450 | 150.0000 | 150.0000 | 110.0000 | 190.0000 |
| ABC | 2020 | 525 | 175.0000 | 175.0000 | 115.0000 | 235.0000 |
| XYZ | 2020 | 1275 | 318.7500 | 312.5000 | 222.5000 | 420.0000 |
+----------+------+------------+------------+---------------+----------------------------------------------------+----------------------------------------------------+
The above query gives the desired result because both the PERCENTILE_CONT
and MEDIAN
functions are allowed as standard aggregate functions. Also, it’s not necessary to use window functions to get this result.