Outdated Version

You are viewing an older version of this section. View current production version.

Persistent Computed Columns

A computed column is a column defined by an expression that uses other columns in the table. MemSQL allows users to create persistent computed columns defined by an expression that combines other columns, constants, built-in functions,and operators operators. MemSQL’s computed columns are fully materialized and can be indexed like a standard column. However, computed columns cannot be part of the shard key.

There are storage and performance tradeoffs to consider when using computed columns. At a high level, computed columns are a way to optimize computationally expensive read queries that use built-in functions or require additional data processing. They allow users to precompute values that would otherwise have to be computed as part of execution of a read query. However, computed columns also consume additional storage and require more computation on writes. Some example use cases for computed columns include:

  • Parsing JSON blobs for improved read performance
  • Precomputing a value using a function that includes values from other columns in the table
  • Extracting values from a column, such as a year from a timestamp or a domain from a URL

Computed columns can be created as part of a CREATE TABLE statement, or can be added later using ALTER TABLE .

Computed Column Examples

The following statement creates a table t with an INT column a and a persistent computed column b that is defined using an expression that takes the value in a and adds 1 to it.

memsql> CREATE TABLE t (a INT PRIMARY KEY, b AS a + 1 PERSISTED INT);

Inserting values into column a will automatically populate column b using the column definition expression.

    memsql> INSERT INTO t (a) VALUES (1), (2);
    Query OK, 2 rows affected (0.01 sec)
    Records: 2  Duplicates: 0  Warnings: 0

    memsql> SELECT * FROM t;
    +---+------+
    | a | b    |
    +---+------+
    | 2 |    3 |
    | 1 |    2 |
    +---+------+
    2 rows in set (0.00 sec)

It is also possible to add a computed column to an existing table using ALTER TABLE .

    memsql> ALTER TABLE t ADD COLUMN c AS a * 2 - b PERSISTED INT;
    Query OK, 0 rows affected (11.21 sec)
    Records: 0  Duplicated: 0  Warnings: 0

    memsql> SELECT * FROM t;
    +---+------+------+
    | a | b    | c    |
    +---+------+------+
    | 2 |    3 |    1 |
    | 1 |    2 |    0 |
    +---+------+------+
    2 rows in set (0.00 sec)

Attempts to insert values directly into a computed column will result in an error, even if the values are “correct” with respect to the column definition.

    memsql> INSERT INTO t (a, b, c) VALUES (3, 4, 2);
    ERROR 1822 (HY000): Cannot insert into computed column 'b'

Note that a computed column can be created using any expression, that outputs a value that is a valid MemSQL data type, composed of operators and built-in functions. For example, computed columns allow you to extract values from a JSON blob to they can be scanned and read more quickly. However, note that aggregations and non-deterministic functions such as rand() and now() cannot be used in computed columns.

   memsql> SELECT * FROM j;
   +---+-------------------------------------------------+
   | a | b                                               |
   +---+-------------------------------------------------+
   | 1 | {"name":"Sandy Beach","occupation":"lifeguard"} |
   +---+-------------------------------------------------+
   1 row in set (0.00 sec)

   memsql> ALTER TABLE j ADD COLUMN occupation AS b::$occupation PERSISTED TEXT;
   Query OK, 0 rows affected (12.80 sec)
   Records: 0  Duplicates: 0  Warnings: 0

   memsql> SELECT  * FROM j;
   +---+-------------------------------------------------+------------+
   | a | b                                               | occupation |
   +---+-------------------------------------------------+------------+
   | 1 | {"name":"Sandy Beach","occupation":"lifeguard"} | lifeguard  |
   +---+-------------------------------------------------+------------+
   1 row in set (3.73 sec)

Modifying values in a column used to define a computed column will also update the affected values in the computed column.

    memsql> update j set b::$occupation = 'circus clown';
    Query OK, 1 row affected (3.49 sec)

    memsql> SELECT * FROM j;
    +---+----------------------------------------------------+--------------+
    | a | b                                                  | occupation   |
    +---+----------------------------------------------------+--------------+
    | 1 | {"name":"Sandy Beach","occupation":"circus clown"} | circus clown |
    +---+----------------------------------------------------+--------------+
    1 row in set (0.00 sec)