Outdated Version

You are viewing an older version of this section. View current production version.

Window Functions Guide

Window functions perform ranking and aggregate calculations across sets of rows that bear some relation to the current row. Unlike aggregates, which also act on multiple rows, window functions return a result for every row, instead of aggregating values from all rows into a single value.

Syntax

The syntax for using window functions in MemSQL is the following:

function ([expr]) OVER {window_name | (window_definition)}

window_definition:
    [partition_by_clause] [order_by_clause] [frame_clause]

partition_by_clause:
    PARTITION BY {col | expr}

order_by_clause:
    ORDER BY {col | expr}

frame_clause:
    {ROWS | RANGE} UNBOUNDED PRECEDING | BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Using window_name allows multiple functions to use the same window definition. window_name is a reference to a named window definition given in a WINDOW clause, like:

SELECT … FROM … WINDOW window_name AS (window_definition)

Supported Functions

Ranking functions:

Value functions:

Aggregate functions:

Notes

  • Window definition clauses are like standard GROUP BY and ORDER BY except that you can not refer to the projections list using aliases or position (literal integers will be treated as integers instead of positional references).
  • The default value for frame_clause is RANGE UNBOUNDED PRECEDING.

Examples

In the following simple examples, the table t contains four uniquely identified values divided into two categories.

memsql> create table t (id int, category char(1), value int);
memsql> insert into t values (1, 'a', 1), (2, 'a', 3), (3, 'b', 2), (4, 'b', 4);

memsql> select * from t order by id;
+----+----------+-------+
| id | category | value |
+----+----------+-------+
|  1 | a        |     1 |
|  2 | a        |     3 |
|  3 | b        |     2 |
|  4 | b        |     4 |
+----+----------+-------+

Selecting the average of a column with GROUP BY performs an aggregation that returns a single row per group. Selecting (projecting over) id would be meaningless in this case - MemSQL will return values from an arbitrary row.

memsql> select category, avg(value) from t group by category;
+----------+------------+
| category | avg(value) |
+----------+------------+
| a        |     2.0000 |
| b        |     3.0000 |
+----------+------------+

In contrast, using avg as a window function with OVER and partitioning by category will compute the average value for each partition of category, then append that category-wise average to each row.

memsql> select id, category, value, avg(value) over (partition by category) from t;
+----+----------+-------+-----------------------------------------+
| id | category | value | avg(value) over (partition by category) |
+----+----------+-------+-----------------------------------------+
|  2 | a        |     3 |                                  2.0000 |
|  1 | a        |     1 |                                  2.0000 |
|  4 | b        |     4 |                                  3.0000 |
|  3 | b        |     2 |                                  3.0000 |
+----+----------+-------+-----------------------------------------+

Here is another example where we order by value within each partition, and use the rank() window function to produce the rank of each row within its partition.

memsql> select id, category, value, rank() over (partition by category order by value) as rank from t;
+------+----------+-------+------+
| id   | category | value | rank |
+------+----------+-------+------+
|    1 | a        |     1 |    1 |
|    2 | a        |     3 |    2 |
|    3 | b        |     2 |    1 |
|    4 | b        |     4 |    2 |
+------+----------+-------+------+