Outdated Version

You are viewing an older version of this section. View current production version.

Query Plan Operations

This topic describes the operations that a query plan may use. These operations are displayed when you use MemSQL Studio Visual Explain, run EXPLAIN, or run PROFILE to show a query plan. The examples in this topic use two tables : t, a rowstore table with a primary key, and ct a columnstore table. These tables are both in database db1.

CREATE TABLE t(id INT PRIMARY KEY, a INT, b INT, KEY(a));

CREATE TABLE ct(a INT, b INT, KEY(a) USING CLUSTERED COLUMNSTORE, SHARD KEY(a));

Table access methods

  • TableScan - scans every row in a table using an index

  • IndexSeek - navigates to a particular row using an index

  • IndexRangeScan - scans a range of rows using an index

  • ColumnStoreScan scans a table using a columnstore index

  • OrderedColumnStoreScan - scans a table using a columnstore index in index order

    EXPLAIN SELECT * FROM t WHERE t.a = 5;
    ****
    +--------------------------------------------------------------------------+
    | EXPLAIN                                                                  |
    +--------------------------------------------------------------------------+
    | Gather partitions:all alias:remote_0                                     |
    | Project [t.id, t.a, t.b]                                                 |
    | IndexRangeScan db1.t, KEY a (a) scan:[a = 5] table_type:sharded_rowstore |
    +--------------------------------------------------------------------------+
    
    EXPLAIN SELECT * FROM ct;
    ****
    +----------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                      |
    +----------------------------------------------------------------------------------------------+
    | Gather partitions:all alias:remote_0                                                         |
    | Project [ct.a, ct.b]                                                                         |
    | ColumnStoreScan db1.ct, KEY a (a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
    +----------------------------------------------------------------------------------------------+
    

Group by and aggregations

  • Aggregate - computes an aggregate.

  • HashGroupBy - uses a hash table to compute group by results

  • StreamingGroupBy - leverages the fact that the underlying operation produces rows in order to compute group by results. This advantage of StreamingGroupBy is that it only uses a constant amount of memory

  • Distinct - removes duplicate rows

    EXPLAIN SELECT SUM(id) FROM t;
    ****
    +---------------------------------------------------------------+
    | EXPLAIN                                                       |
    +---------------------------------------------------------------+
    | Project [`SUM(id)`]                                           |
    | Aggregate [SUM(remote_0.`SUM(id)`) AS `SUM(id)`]              |
    | Gather partitions:all alias:remote_0                          |
    | Project [`SUM(id)`]                                           |
    | Aggregate [SUM(t.id) AS `SUM(id)`]                            |
    | TableScan db1.t, PRIMARY KEY (id) table_type:sharded_rowstore |
    +---------------------------------------------------------------+
    
    EXPLAIN SELECT SUM(id) FROM t GROUP BY a+1;
    ****
    +-----------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                       |
    +-----------------------------------------------------------------------------------------------+
    | Project [`SUM(id)`] est_rows:1                                                                |
    | HashGroupBy [SUM(remote_0.`SUM(id)`) AS `SUM(id)`] groups:[remote_0.op]                       |
    | Gather partitions:all est_rows:1 alias:remote_0                                               |
    | Project [`SUM(id)`, t.a + 1 AS op] est_rows:1                                                 |
    | HashGroupBy [SUM(t.id) AS `SUM(id)`] groups:[t.a + 1]                                         |
    | TableScan db1.t, PRIMARY KEY (id) table_type:sharded_rowstore est_table_rows:1 est_filtered:1 |
    +-----------------------------------------------------------------------------------------------+
    

Distributed data movement

  • Gather - collects all the results from the leaf nodes to the aggregator node. When a query can be routed to a single partition it has the attribute partitions:single. If Gather collects data from all the partitions the attribute is set to partitions:all. Queries that have partitions:single are called single partition queries. An advantage of single partition queries is that they can scale to much higher concurrency and throughput because they only need to execute on a single partition.

  • GatherMerge - collects ordered streams of rows from the leaves and merges them to output an ordered stream.

    EXPLAIN SELECT * FROM t WHERE id = 1;
    ****
    +-----------------------------------------------------------------------------+
    | EXPLAIN                                                                     |
    +-----------------------------------------------------------------------------+
    | Gather partitions:single alias:remote_0                                     |
    | Project [t.id, t.a, t.b]                                                    |
    | IndexSeek db1.t, PRIMARY KEY (id) scan:[id = 1] table_type:sharded_rowstore |
    +-----------------------------------------------------------------------------+
    
    EXPLAIN SELECT * FROM t WHERE id > 1;
    ****
    +----------------------------------------------------------------------------------+
    | EXPLAIN                                                                          |
    +----------------------------------------------------------------------------------+
    | Gather partitions:all alias:remote_0                                             |
    | Project [t.id, t.a, t.b]                                                         |
    | IndexRangeScan db1.t, PRIMARY KEY (id) scan:[id > 1] table_type:sharded_rowstore |
    +----------------------------------------------------------------------------------+
    
    EXPLAIN SELECT * FROM t ORDER BY id;
    ****
    +---------------------------------------------------------------+
    | EXPLAIN                                                       |
    +---------------------------------------------------------------+
    | GatherMerge [remote_0.id] partitions:all alias:remote_0       |
    | Project [t.id, t.a, t.b]                                      |
    | TableScan db1.t, PRIMARY KEY (id) table_type:sharded_rowstore |
    +---------------------------------------------------------------+
    
  • Repartition - redistributes a dataset to hash-partition it on a particular key

  • Broadcast - broadcasts a dataset to every node in a cluster

    EXPLAIN SELECT * FROM t, ct WHERE t.id = ct.b;
    ****
    +------------------------------------------------------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                                                                  |
    +------------------------------------------------------------------------------------------------------------------------------------------+
    | Project [t.id, t.a, t.b, ct.a, ct.b] est_rows:1                                                                                          |
    | HashJoin [t.id = ct.b]                                                                                                                   |
    | |---HashTableBuild alias:ct                                                                                                              |
    | |   Project [remote_1.a, remote_1.b] est_rows:1                                                                                          |
    | |   Gather partitions:all est_rows:1 alias:remote_1                                                                                      |
    | |   Project [ct_0.a, ct_0.b] est_rows:1                                                                                                  |
    | |   ColumnStoreScan db1.ct AS ct_0, KEY a (a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
    | Gather partitions:all est_rows:1 alias:t                                                                                                 |
    | Project [remote_0.id, remote_0.a, remote_0.b] est_rows:1                                                                                 |
    | TableScan db1.t AS remote_0, PRIMARY KEY (id) table_type:sharded_rowstore est_table_rows:1 est_filtered:1                                |
    +------------------------------------------------------------------------------------------------------------------------------------------+
    
  • ChoosePlan indicates that MemSQL will choose one of the listed plans at runtime based on cost estimates. estimate illustrates the statistics that are being estimated, but note that these SQL statements are not actually estimated. Instead, MemSQL uses index information to estimate these statistics.

    EXPLAIN SELECT * FROM t WHERE id > 5 AND a > 5;
    ****
    +--------------------------------------------------------------------------------------+
    | EXPLAIN                                                                              |
    +--------------------------------------------------------------------------------------+
    | Gather partitions:all alias:remote_0                                                 |
    | Project [t.id, t.a, t.b]                                                             |
    | ChoosePlan                                                                           |
    | |   :estimate                                                                        |
    | |       SELECT COUNT(*) AS cost FROM db1.t WHERE t.id > 5                            |
    | |       SELECT COUNT(*) AS cost FROM db1.t WHERE t.a > 5                             |
    | |---Filter [t.a > 5]                                                                 |
    | |   IndexRangeScan db1.t, PRIMARY KEY (id) scan:[id > 5] table_type:sharded_rowstore |
    | +---Filter [t.id > 5]                                                                |
    |     IndexRangeScan db1.t, KEY a (a) scan:[a > 5] table_type:sharded_rowstore         |
    +--------------------------------------------------------------------------------------+
    

Joins

  • HashJoin - performs a hash join: MemSQL builds a hash table from the results of the inner side of the join and probes into it while scanning the outer part of the join

  • MergeJoin - performs a merge join: MemSQL scans both inner and outer sides of the join at the same time and merges matching rows

  • NestedLoopJoin - performs a NestedLoop join: for every row on the outer side of the join MemSQL seeks or scans into the inner table to find all the matching rows

    EXPLAIN SELECT * FROM t t1, t t2 WHERE t1.id = t2.a;
    ****
    +-----------------------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                                   |
    +-----------------------------------------------------------------------------------------------------------+
    | Project [t1.id, t1.a, t1.b, t2.id, t2.a, t2.b] est_rows:1                                                 |
    | HashJoin [t2.a = t1.id]                                                                                   |
    | |---HashTableBuild alias:t2                                                                               |
    | |   Project [remote_1.id, remote_1.a, remote_1.b] est_rows:1                                              |
    | |   Gather partitions:all est_rows:1 alias:remote_1                                                       |
    | |   Project [t2_0.id, t2_0.a, t2_0.b] est_rows:1                                                          |
    | |   TableScan db1.t AS t2_0, PRIMARY KEY (id) table_type:sharded_rowstore est_table_rows:1 est_filtered:1 |
    | Gather partitions:all est_rows:1 alias:t1                                                                 |
    | Project [remote_0.id, remote_0.a, remote_0.b] est_rows:1                                                  |
    | TableScan db1.t AS remote_0, PRIMARY KEY (id) table_type:sharded_rowstore est_table_rows:1 est_filtered:1 |
    +-----------------------------------------------------------------------------------------------------------+
    
    EXPLAIN SELECT * FROM ct t1, ct t2 WHERE t1.a = t2.a;
    ****
    +-----------------------------------------------------------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                                                                       |
    +-----------------------------------------------------------------------------------------------------------------------------------------------+
    | Gather partitions:all est_rows:1 alias:remote_0                                                                                               |
    | Project [t1.a, t1.b, t2.a AS a_1, t2.b AS b_2] est_rows:1                                                                                     |
    | MergeJoin [t2.a = t1.a]                                                                                                                       |
    | |---ColumnStoreFilter [t2.a IS NOT NULL]                                                                                                      |
    | |   OrderedColumnStoreScan db1.ct AS t2, KEY a (a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
    | +---ColumnStoreFilter [t1.a IS NOT NULL]                                                                                                      |
    |     OrderedColumnStoreScan db1.ct AS t1, KEY a (a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
    +-----------------------------------------------------------------------------------------------------------------------------------------------+
    
    EXPLAIN SELECT * FROM t, ct WHERE t.b = ct.b;
    ****
    +------------------------------------------------------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                                                                  |
    +------------------------------------------------------------------------------------------------------------------------------------------+
    | Project [t.id, t.a, t.b, ct.a, ct.b] est_rows:1                                                                                          |
    | HashJoin [t.b = ct.b]                                                                                                                    |
    | |---HashTableBuild alias:ct                                                                                                              |
    | |   Project [remote_1.a, remote_1.b] est_rows:1                                                                                          |
    | |   Gather partitions:all est_rows:1 alias:remote_1                                                                                      |
    | |   Project [ct_0.a, ct_0.b] est_rows:1                                                                                                  |
    | |   ColumnStoreScan db1.ct AS ct_0, KEY a (a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
    | Gather partitions:all est_rows:1 alias:t                                                                                                 |
    | Project [remote_0.id, remote_0.a, remote_0.b] est_rows:1                                                                                 |
    | TableScan db1.t AS remote_0, PRIMARY KEY (id) table_type:sharded_rowstore est_table_rows:1 est_filtered:1                                |
    +------------------------------------------------------------------------------------------------------------------------------------------+