You are viewing an older version of this section. View current production version.
Window Functions Guide
Window functions perform ranking and aggregate calculations across sets of rows that bear some relation to the current row. Unlike aggregates, which also act on multiple rows, window functions return a result for every row, instead of aggregating values from all rows into a single value.
Syntax
Window functions in MemSQL use the following syntax:
function ([expr]) OVER {window_name | (window_definition)}
window_definition:
[partition_by_clause] [order_by_clause] [frame_clause]
partition_by_clause:
PARTITION BY {col | expr}
order_by_clause:
ORDER BY {col | expr}
frame_clause:
{ROWS | RANGE}
{UNBOUNDED PRECEDING | int PRECEDING | CURRENT ROW} |
{BETWEEN
{UNBOUNDED PRECEDING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
AND
{UNBOUNDED FOLLOWING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
}
Syntax Remarks
window_definition
Window definition clauses are similar to standard GROUP BY
and ORDER BY
clauses with the following exception: you cannot refer to the projections list using aliases or position, because literal integers are treated as integers instead of positional references.
frame_clause
The frame_clause
is used with an aggregate window function, and it specifies a subset of rows in a partition to evaluate. Ranking, value, and percentile window functions are not affected by the frame_clause
because they are intended to operate on the entire set of rows in each partition.
In the frame_clause
, int PRECEDING
and int FOLLOWING
are referred to as frame offsets. RANGE
cannot be used with frame offsets. Additionally, frame offsets must be non-negative integers. Any non-integer is automatically rounded.
If BETWEEN
is not specified, the default upper bound is CURRENT ROW
. For example, ROWS UNBOUNDED PRECEDING
is equivalent to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
If no frame is specified, the default is RANGE UNBOUNDED PRECEDING
, i.e. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
RANGE
includes all rows whose elements in the order_by_clause
are equal. If the order_by_clause
is empty, RANGE
will include all rows in the partition. Consider the following example:
memsql> CREATE TABLE t (a INT);
memsql> INSERT INTO t VALUES (1),(2),(2),(3),(3),(3),(4);
memsql> SELECT COUNT(1) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) as 'col1',
COUNT(1) OVER (ORDER BY a RANGE UNBOUNDED PRECEDING) as 'col2' FROM t;
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 2 | 3 |
| 3 | 3 |
| 4 | 6 |
| 5 | 6 |
| 6 | 6 |
| 7 | 7 |
+------+------+
Supported Functions
The following built-in functions are supported in a window function:
Ranking functions:
Value functions:
Aggregate functions:
Percentile functions:
Named Window Definition
window_name
is a reference to a named window definition given in a WINDOW
clause, such as:
SELECT RANK() OVER window_name FROM table WINDOW window_name AS (window_definition)
If using multiple window functions on the same window frame, you may see improved performance by setting and using a window_name. Using the same window_name across multiple functions allows MemSQL code generation to make execution more efficient.
Consider the following examples. The statement below does not use a window_name
for two identical window definitions.
SELECT website, timestamp,
AVG(clicks) OVER (PARTITION BY website
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING),
SUM(visits) OVER (PARTITION BY website
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM t;
The statement above can be improved by using window_name
syntax, which will improve performance:
SELECT website, timestamp,
AVG(clicks) OVER w,
SUM(visits) OVER w
FROM t
WINDOW w AS (PARTITION BY website
ORDER BY timestamp
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING);
The second version of the query has fewer parameters, so the generated code can be smaller and more efficient.
Examples
In the following simple examples, the table t
contains four uniquely identified values divided into two categories.
memsql> CREATE TABLE t (id int, category char(1), value int);
memsql> INSERT INTO t VALUES (1, 'a', 1), (2, 'a', 3),
(3, 'b', 2), (4, 'b', 4);
memsql> SELECT * FROM t ORDER BY id;
+----+----------+-------+
| id | category | value |
+----+----------+-------+
| 1 | a | 1 |
| 2 | a | 3 |
| 3 | b | 2 |
| 4 | b | 4 |
+----+----------+-------+
Selecting the average of a column with GROUP BY
performs an aggregation that returns a single row per group. Selecting (projecting over) id
would be meaningless in this case – MemSQL will return values from an arbitrary row.
memsql> SELECT category, avg(value) FROM t GROUP BY category;
+----------+------------+
| category | avg(value) |
+----------+------------+
| a | 2.0000 |
| b | 3.0000 |
+----------+------------+
In contrast, using avg
as a window function with OVER
and partitioning by category will compute the average value for each partition of category, then append that category-wise average to each row.
memsql> select id, category, value, avg(value) over (partition by category) from t;
+----+----------+-------+-----------------------------------------+
| id | category | value | avg(value) over (partition by category) |
+----+----------+-------+-----------------------------------------+
| 2 | a | 3 | 2.0000 |
| 1 | a | 1 | 2.0000 |
| 4 | b | 4 | 3.0000 |
| 3 | b | 2 | 3.0000 |
+----+----------+-------+-----------------------------------------+
Here is another example where we order by value
within each partition, and use the rank()
window function to produce the rank of each row within its partition.
memsql> SELECT id, category, value, rank() OVER (partition by category order by value) AS rank FROM t;
+------+----------+-------+------+
| id | category | value | rank |
+------+----------+-------+------+
| 1 | a | 1 | 1 |
| 2 | a | 3 | 2 |
| 3 | b | 2 | 1 |
| 4 | b | 4 | 2 |
+------+----------+-------+------+