There are three types of tables in MemSQL: reference tables, sharded tables, and temporary tables.
Reference tables are relatively small tables that do not need to be distributed and are present on every node in the cluster. They are both created and written to on the master aggregator. Reference tables are implemented via primary-secondary replication to every node in the cluster from the master aggregator. Replication enables reference tables to be dynamic: updates that you perform to a reference table on the master aggregator are quickly reflected on every machine in the cluster.
MemSQL aggregators can take advantage of reference tables’ ubiquity by pushing joins between reference tables and a distributed table onto the leaves. Imagine you have a distributed
clicks table storing billions of records and a smaller
customers table with just a few million records. Since the
customers table is small, it can be replicated on every node in the cluster. If you run a join between the
clicks table and the
customers table, then the bulk of the work for the join will occur on the leaves.
Reference tables are a convenient way to implement dimension tables.
Every database in the distributed system is split into a number of partitions. Each sharded table created is split with hash partitioning over the table’s primary key; a portion of the table is stored in each of its database’s partitions.
Partitions are implemented as databases on the leaves. For example, partition 3 of database
db is stored in a database
db_3 on one of the leaves. For every sharded table created inside
db, a portion of its data will reside in
db_3 on that leaf.
Although sharded tables in the same database share the same database containers on the leaves, no assumptions can be made about particular rows from different tables being co-located on a partition. If you join two tables on the column(s) they are sharded on, MemSQL can perform a co-located join, which will improve the speed of the join.
Temporary tables exist, storing data in memory, for the duration of a client session. MemSQL does not write logs or take snapshots of temporary tables. Temporary tables are designed for temporary, intermediate computations.
Unlike CREATE TABLE without the
CREATE TEMPORARY TABLE can be run on any aggregator, not just the master aggregator. Temporary tables are sharded tables, and can be modified and queried like any “permanent” table, including distributed joins.
Temporary tables are not persisted in MemSQL, which means they have high availability disabled. If a node in a cluster goes down and a failover occurs, all the temporary tables on the cluster lose data. Whenever a query references a temporary table after a node failover, it returns an error. For example,
"Temporary table <table> is missing on leaf <host> due to failover. The table will need to be dropped and recreated."
To prevent loss of data on node failover, use MemSQL tables that have high availability enabled.
Views cannot reference temporary tables because temporary tables only exist for the duration of a client session. Although MemSQL does not materialize views, views are available as a service for all clients, and so cannot depend on client session-specific temporary tables. Temporary tables do not support
- CREATE TABLE
- SHOW TABLES
- SHOW COLUMNS
- ALTER TABLE
- SHOW TABLE STATUS
- FLUSH TABLES
- UNLOCK TABLES
- DROP TABLE