Outdated Version

You are viewing an older version of this section. View current production version.

Window Functions Guide

Window functions perform ranking and aggregate calculations across sets of rows that bear some relation to the current row. Unlike aggregates, which also act on multiple rows, window functions return a result for every row, instead of aggregating values from all rows into a single value.

Syntax

Window functions in MemSQL use the following syntax:

function ([expr]) OVER {window_name | (window_definition)}

window_definition:
    [partition_by_clause] [order_by_clause] [frame_clause]

partition_by_clause:
    PARTITION BY {col | expr}

order_by_clause:
    ORDER BY {col | expr}

frame_clause:
    {ROWS | RANGE} 
        {UNBOUNDED PRECEDING | int PRECEDING | CURRENT ROW} |
        {BETWEEN 
            {UNBOUNDED PRECEDING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
         AND
            {UNBOUNDED FOLLOWING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
        }

Syntax Remarks

window_definition

Window definition clauses are similar to standard GROUP BY and ORDER BY clauses with the following exception: you cannot refer to the projections list using aliases or position, because literal integers are treated as integers instead of positional references.

frame_clause

The frame_clause is used with an aggregate window function, and it specifies a subset of rows in a partition to evaluate. Ranking, value, and percentile window functions are not affected by the frame_clause because they are intended to operate on the entire set of rows in each partition.

In the frame_clause, int PRECEDING and int FOLLOWING are referred to as frame offsets. RANGE cannot be used with frame offsets. Additionally, frame offsets must be non-negative integers. Any non-integer is automatically rounded.

If BETWEEN is not specified, the default upper bound is CURRENT ROW. For example, ROWS UNBOUNDED PRECEDING is equivalent to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

If no frame is specified, the default is RANGE UNBOUNDED PRECEDING, i.e. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

RANGE includes all rows whose elements in the order_by_clause are equal. If the order_by_clause is empty, RANGE will include all rows in the partition. Consider the following example:

memsql> CREATE TABLE t (a INT);

memsql> INSERT INTO t VALUES (1),(2),(2),(3),(3),(3),(4);

memsql> SELECT COUNT(1) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) as 'col1',
          COUNT(1) OVER (ORDER BY a RANGE UNBOUNDED PRECEDING) as 'col2'  FROM t;
+------+------+
| col1 | col2 |
+------+------+
|    1 |    1 |
|    2 |    3 |
|    3 |    3 |
|    4 |    6 |
|    5 |    6 |
|    6 |    6 |
|    7 |    7 |
+------+------+

Supported Functions

The following built-in functions are supported in a window function:

Ranking functions:

Value functions:

Aggregate functions:

Percentile functions:

Named Window Definition

window_name is a reference to a named window definition given in a WINDOW clause, such as:

SELECT RANK() OVER window_name FROM table WINDOW window_name AS (window_definition)

If using multiple window functions on the same window frame, you may see improved performance by setting and using a window_name. Using the same window_name across multiple functions allows MemSQL code generation to make execution more efficient.

Consider the following examples. The statement below does not use a window_name for two identical window definitions.

SELECT website, timestamp, 
AVG(clicks) OVER (PARTITION BY website 
                  ORDER BY timestamp
                  ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING), 
SUM(visits) OVER (PARTITION BY website
                  ORDER BY timestamp
                  ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM t;

The statement above can be improved by using window_name syntax, which will improve performance:

SELECT website, timestamp, 
AVG(clicks) OVER w, 
SUM(visits) OVER w 
FROM t 
WINDOW w AS (PARTITION BY website
              ORDER BY timestamp
              ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING);

The second version of the query has fewer parameters, so the generated code can be smaller and more efficient.

Examples

In the following simple examples, the table t contains four uniquely identified values divided into two categories.

memsql> CREATE TABLE t (id int, category char(1), value int);
memsql> INSERT INTO t VALUES (1, 'a', 1), (2, 'a', 3),
        (3, 'b', 2), (4, 'b', 4);

memsql> SELECT * FROM t ORDER BY id;
+----+----------+-------+
| id | category | value |
+----+----------+-------+
|  1 | a        |     1 |
|  2 | a        |     3 |
|  3 | b        |     2 |
|  4 | b        |     4 |
+----+----------+-------+

Selecting the average of a column with GROUP BY performs an aggregation that returns a single row per group. Selecting (projecting over) id would be meaningless in this case – MemSQL will return values from an arbitrary row.

memsql> SELECT category, avg(value) FROM t GROUP BY category;
+----------+------------+
| category | avg(value) |
+----------+------------+
| a        |     2.0000 |
| b        |     3.0000 |
+----------+------------+

In contrast, using avg as a window function with OVER and partitioning by category will compute the average value for each partition of category, then append that category-wise average to each row.

memsql> select id, category, value, avg(value) over (partition by category) from t;
+----+----------+-------+-----------------------------------------+
| id | category | value | avg(value) over (partition by category) |
+----+----------+-------+-----------------------------------------+
|  2 | a        |     3 |                                  2.0000 |
|  1 | a        |     1 |                                  2.0000 |
|  4 | b        |     4 |                                  3.0000 |
|  3 | b        |     2 |                                  3.0000 |
+----+----------+-------+-----------------------------------------+

Here is another example where we order by value within each partition, and use the rank() window function to produce the rank of each row within its partition.

memsql> SELECT id, category, value, rank() OVER (partition by category order by value) AS rank FROM t;
+------+----------+-------+------+
| id   | category | value | rank |
+------+----------+-------+------+
|    1 | a        |     1 |    1 |
|    2 | a        |     3 |    2 |
|    3 | b        |     2 |    1 |
|    4 | b        |     4 |    2 |
+------+----------+-------+------+