Window functions perform ranking and aggregate calculations across sets of rows that bear some relation to the current row. Unlike aggregates, which also act on multiple rows, window functions return a result for every row, instead of aggregating values from all rows into a single value.
Syntax
Window functions in SingleStore DB use the following syntax:
function ([expr]) OVER {window_name | (window_definition)}
window_definition:
[partition_by_clause] [order_by_clause] [frame_clause]
partition_by_clause:
PARTITION BY {col | expr}
order_by_clause:
ORDER BY {col | expr}
frame_clause:
{ROWS | RANGE}
{UNBOUNDED PRECEDING | int PRECEDING | CURRENT ROW} |
{BETWEEN
{UNBOUNDED PRECEDING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
AND
{UNBOUNDED FOLLOWING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
}
Syntax Remarks
window_definition
Window definition clauses are similar to standard GROUP BY
and ORDER BY
clauses with the following exception: you cannot refer to the projections list using aliases or position, because literal integers are treated as integers instead of positional references.
frame_clause
The frame_clause
is used with an aggregate window function, and it specifies a subset of rows in a partition to evaluate. Ranking, value, and percentile window functions are not affected by the frame_clause
because they are intended to operate on the entire set of rows in each partition.
In the frame_clause
, int PRECEDING
and int FOLLOWING
are referred to as frame offsets. RANGE
cannot be used with frame offsets. Additionally, frame offsets must be non-negative integers. Any non-integer is automatically rounded.
If BETWEEN
is not specified, the default upper bound is CURRENT ROW
. For example, ROWS UNBOUNDED PRECEDING
is equivalent to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
If no frame is specified, the default is RANGE UNBOUNDED PRECEDING
, i.e. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
RANGE
includes all rows whose elements in the order_by_clause
are equal. If the order_by_clause
is empty, RANGE
will include all rows in the partition. Consider the following example:
memsql> CREATE TABLE t (a INT);
memsql> INSERT INTO t VALUES (1),(2),(2),(3),(3),(3),(4);
memsql> SELECT COUNT(1) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) as 'col1',
COUNT(1) OVER (ORDER BY a RANGE UNBOUNDED PRECEDING) as 'col2' FROM t;
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 2 | 3 |
| 3 | 3 |
| 4 | 6 |
| 5 | 6 |
| 6 | 6 |
| 7 | 7 |
+------+------+
Supported Functions
The following built-in functions are supported in a window function:
Ranking functions:
Value functions:
Aggregate functions:
Percentile functions:
Named Window Definition
window_name
is a reference to a named window definition given in a WINDOW
clause, such as:
SELECT RANK() OVER window_name FROM table WINDOW window_name AS (window_definition)
If using multiple window functions on the same window frame, you may see improved performance by setting and using a window_name. Using the same window_name across multiple functions allows SingleStore DB code generation to make execution more efficient.
Consider the following examples. The statement below does not use a window_name
for two identical window definitions.
SELECT website, timestamp,
AVG(clicks) OVER (PARTITION BY website
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING),
SUM(visits) OVER (PARTITION BY website
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM t;
The statement above can be improved by using window_name
syntax, which will improve performance:
SELECT website, timestamp,
AVG(clicks) OVER w,
SUM(visits) OVER w
FROM t
WINDOW w AS (PARTITION BY website
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING);
The second version of the query has fewer parameters, so the generated code can be smaller and more efficient.
Examples
In the following simple examples, the table t
contains four uniquely identified values divided into two categories.
memsql> CREATE TABLE t (id int, category char(1), value int);
memsql> INSERT INTO t VALUES (1, 'a', 1), (2, 'a', 3),
(3, 'b', 2), (4, 'b', 4);
memsql> SELECT * FROM t ORDER BY id;
+----+----------+-------+
| id | category | value |
+----+----------+-------+
| 1 | a | 1 |
| 2 | a | 3 |
| 3 | b | 2 |
| 4 | b | 4 |
+----+----------+-------+
Selecting the average of a column with GROUP BY
performs an aggregation that returns a single row per group. Selecting (projecting over) id
would be meaningless in this case – SingleStore DB will return values from an arbitrary row.
memsql> SELECT category, avg(value) FROM t GROUP BY category;
+----------+------------+
| category | avg(value) |
+----------+------------+
| a | 2.0000 |
| b | 3.0000 |
+----------+------------+
In contrast, using avg
as a window function with OVER
and partitioning by category will compute the average value for each partition of category, then append that category-wise average to each row.
memsql> select id, category, value, avg(value) over (partition by category) from t;
+----+----------+-------+-----------------------------------------+
| id | category | value | avg(value) over (partition by category) |
+----+----------+-------+-----------------------------------------+
| 2 | a | 3 | 2.0000 |
| 1 | a | 1 | 2.0000 |
| 4 | b | 4 | 3.0000 |
| 3 | b | 2 | 3.0000 |
+----+----------+-------+-----------------------------------------+
Here is another example where we order by value
within each partition, and use the rank()
window function to produce the rank of each row within its partition.
memsql> SELECT id, category, value, rank() OVER (partition by category order by value) AS rank FROM t;
+------+----------+-------+------+
| id | category | value | rank |
+------+----------+-------+------+
| 1 | a | 1 | 1 |
| 2 | a | 3 | 2 |
| 3 | b | 2 | 1 |
| 4 | b | 4 | 2 |
+------+----------+-------+------+