Outdated Version

You are viewing an older version of this section. View current production version.

Code Generation

A large contributor to SingleStore DB’s superior query execution performance is its system of code generation. In contrast to a traditional interpreter-based execution model for databases, SingleStore DB embeds an industrial compiler to produce highly efficient machine code that enables low-level optimizations, which are not possible when executing queries via interpretation alone. And now by default, queries are interpreted first and then asynchronously compiled in the background for use in later executions. This speeds up query execution time for long and complex queries, while at the same time providing efficient query plans for later use.

As stated above, the first time SingleStore DB encounters a given query shape, it will optimize and compile the query asynchronously for future invocations. This incurs minimal overhead which does not depend on the amount of data to be processed, but rather the complexity of the query. The process of code generation involves extracting parameters from the query then transforming the normalized query into a SingleStore DB-specific intermediate representation tailored to the system. Subsequent requests with the same shape can reuse this plan to complete both quickly and consistently.

Code generation applies to all Data Manipulation Language (DML) queries. In addition, SingleStore DB generates code during CREATE TABLE and ALTER TABLE statements. These Data Definition Language (DDL) queries generate code in order to reduce the compilation time of subsequent DML queries against the table.

The Interpreter Mode’s Effects on Code Generation

The interpreter_mode engine variable controls whether SingleStore DB interprets or compiles query shapes, or does both. This section explains how this mode operates when SingleStore DB encounters a new query shape.

Note: llvm is an alias for compile while mbc is an alias for interpret.

interpreter_mode is set to compile

In this mode, SingleStore DB compiles a query’s shape the first time it is encountered. As shown in the following example, the query SELECT * FROM t WHERE col = 1; takes longer to complete on the first run; this is a result of the compile overhead during the first step. Also note that the WHERE clauses are different in the second and third query, but both query shapes are identical. Therefore, SingleStore DB reuses the second query’s plan to run the third query.

SELECT * FROM t WHERE col = 1;
Empty set (0.13 sec)

SELECT * FROM t WHERE col = 1;
Empty set (0.00 sec)

SELECT * FROM t WHERE col = 100000;
Empty set (0.00 sec)

{

Info

Run the PROFILE command followed by the SHOW PROFILE JSON command to view detailed statistics on compilation time (displayed under compile_time_stats in the output).

}

interpreter_mode is set to interpret_first

In this mode, SingleStore DB interprets and compiles a query shape in parallel, upon encountering the query shape for the first time. The query runs in interpreted mode until compilation of the query shape completes. In the following example, the first SELECT * FROM t WHERE col = 1; query runs faster than the first query in the previous example.

SELECT * FROM t WHERE col = 1;
Empty set (0.02 sec)

SELECT * FROM t WHERE col = 1;
Empty set (0.00 sec)

SELECT * FROM t WHERE col = 100000;
Empty set (0.00 sec)

As of MemSQL 6.8, interpreter_mode is set to interpret_first by default.

Understanding the Plancache

After code generation, the compiled query plans are saved for later use in a plancache. Each SingleStore node (aggregator and leaf) has its own plans and plancache. Each user query (on the aggregator) is associated with a plan on that aggregator, and some number of plans (zero, one, or more) on the leaves.

There are two layers of this plancache: the in-memory plancache and the on-disk plancache (located in the @@global.plancachedir directory). Plans are cached in the in-memory plancache until they expire or until that SingleStore node restarts. A plan expires after the interval of time configured by the plan_expiration_minutes global variable has elapsed (the default value of this variable is 720, or 12 hours).

When a plan expires from the in-memory plancache, it remains in the on-disk plancache and is loaded back into memory the next time the query is run. If the global variables enable_disk_plan_expiration and disk_plan_expiration_minutes are set, then a plan that has not been read from disk during the specified disk_plan_expiration_minutes limit will expire from the on-disk plancache and be removed from disk.

To purge all generated plans, both in-memory and on-disk, run the DROP ALL FROM PLANCACHE command. This command invalidates all previously compiled plans, and generates fresh query plans for any new queries.

To see DML query plans that have been cached in the in-memory plancache, run the SHOW PLANCACHE command or query against the information_schema.PLANCACHE table.

For more information on plan expiration and managing disk usage, see Managing Plancache Memory Usage.

Overriding Parameterization of Queries

Parameterization is when the constants in a query are converted into parameters, so that queries with the same structure can reuse the query plan. To override parameterization, use the NOPARAM built-in function.

Codegen Architecture

Code generation produces a specialized programming language, the SingleStore DB Plan Language. MPL is simple, tightly managed and designed specifically for SingleStore DB rather than for general purpose applications. SQL operator trees are converted directly to MPL abstract syntax trees, eliminating the need for expensive parsing and semantic resolution at query compilation time.

In addition to the simplicity of the language, MPL benefits from exposing as primitives many relational operations such as “index seek” and scalar operations such as “BIGINT addition”. These operators are compiled statically along with the server and can easily be combined to form a query plan. In this way, much of the work of compilation can be performed ahead of time.

SingleStore DB Bytecode

When compiling a query, SingleStore DB first flattens MPL into a compact format known as SingleStore DB Bytecode (MBC). Plans in this form can easily be serialized, interpreted or transformed to LLVM bitcode for compilation to x86_64 machine code. Interpreting the bytecode directly nearly eliminates the first-time cost of running a query, but can hamper execution performance:

memsql> select * from t where j > 0;
Empty set (0.05 sec)

memsql> select * from t where j > 0;
Empty set (0.05 sec)

In this example, there is no additional latency on the first request because the query is interpreted.

memsql> select * from t where j > 0;  -- With code generation
Empty set (0.08 sec)

memsql> select * from t where j > 0;  -- Using the cached plan
Empty set (0.02 sec)

Here, the additional latency on the first request is the result of code generation. However, the subsequent request is more than twice as fast as the interpreted execution above. This goes to show the power of SingleStore DB code generation.