Performs optimizations on the storage of rows in a table.
Optimizing Columnstore Tables
This topic assumes familiarity with the concepts of columnstore table layout, including segments, row segment groups, and background mergers. For more on these topics, see Managing Columnstore Segments.
MemSQL automatically runs optimization routines on tables, but they can also be started manually.
OPTIMIZE TABLE table_name [FULL | FLUSH]
This command can only be run against one table at a time. See Deprecations for more details.
The types of optimization performed by
OPTIMIZE TABLE are discussed below.
- Without any arguments,
OPTIMIZE TABLEruns a Manual merge.
- With the
OPTIMIZE TABLEruns a Manual flush. It does not run a Manual merge like
- With the
OPTIMIZE TABLEruns both a Manual merge - Full and a Manual flush (in MemSQL 6.0 and later). In MemSQL 5.8 and earlier, it runs only a Manual merge - Full.
There are a few types of optimization performed on columnstore tables:
- Automatic background optimization - MemSQL has background threads that automatically optimize columnstore indexes as needed. These optimizations are run in a single thread to minimize impact to concurrent query workloads. The background optimization will attempt to keep the number of row segment groups low, but will not attempt to create a single row segment group due to the cost of this operation.
- Manual merge - The
OPTIMIZE TABLE table_namestatement runs a foreground optimization which uses multiple threads to complete the process as fast as possible, unlike automatic background optimization which only uses a single thread. As a result, this may negatively impact the performance of other concurrent workloads.
- Manual merge - Full - With the
OPTIMIZE TABLEcommand sorts the entire table, creating a single sorted row segment group. This is more expensive and therefore takes longer than a regular manual optimization. The potential benefit is that because it sorts the entire table into a single sorted row segment group instead of a small number of them, read queries may perform faster. However, this benefit diminishes as writes are made to the table. See the Managing Columnstore Segments section for more information.
- Manual merge - Full - With the
- Manual flush - This
OPTIMIZE TABLE table_name FLUSHstatement will flush any rows in the in-memory rowstore-format row segment group into one or more columnstore-format row segment groups. Rows in the in-memory rowstore-format group are automatically flushed to the columnstore format in the background, so this command is generally not necessary. Some potential benefits are that this may be helpful if your rowstore segment is taking up a lot of memory that you would like to free up for other purposes, and putting all data into the columnar format may allow queries to run faster.
See Managing Columnstore Segments for details.
Optimizing Rowstore Tables
For rowstore tables,
OPTIMIZE TABLE physically sorts the data in memory by the primary key and optimizes the organization of the table’s internal structures, so primary key table scans will execute significantly faster. However, the benefit of improved query performance diminishes as writes are made to the table. Additionally, memory usage is reduced through defragmentation. It also triggers garbage collection on the table.
OPTIMIZE TABLE table_name
FLUSHflag is not a valid option for rowstore tables.
FULLflag is not a valid option for rowstore tables; the syntax is accepted but it will be ignored by MemSQL.
OPTIMIZE TABLEcan be particularly useful for reducing memory usage and improving query performance in cases where a leaf has been online for a long time, or if the workload is highly transactional.
- This can be a costly command in terms of time and CPU use if many rows need to be reordered to return the table to ordered memory, especially if it is being run during a high traffic time.
- MemSQL supports online
OPTIMIZE TABLE, which means that you can read and write while the table is being optimized.
OPTIMIZE TABLEon a sharded table is always executed online. Note that online
OPTIMIZE TABLEwill not begin optimizing the table, but it will wait until all DML queries that were already running on the table finish. This allows any in-progress queries to complete execution before optimizing the table, and ensures consistency of results from queries on the table since the time of execution of
OPTIMIZE TABLE. As soon as the in-progress reads and writes complete and the
OPTIMIZE TABLEcommand begins optimizing the table, new reads and writes will proceed as normal. This blocking period usually lasts on the order of milliseconds.
- If you are running frequent
OPTIMIZE TABLEstatements on a table and have a lot of long-running queries on that table, then your normal workload may experience some periods of delay since it blocks other queries from starting while it waits for completion of long-running queries.