Outdated Version

You are viewing an older version of this section. View current production version.

Memory Management

Info

MemSQL Helios does not support setting the engine variables that are discussed in this topic.

This page discusses common memory management topics. It includes the following sections:

Configuring Memory Limits

MemSQL has two engine variables that control its memory usage. Both are measured in megabytes.

maximum_memory

MemSQL will not allocate more than maximum_memory megabytes of memory. If a SELECT query’s memory allocations put the memory usage by MemSQL over maximum_memory, query execution stops and an error message is generated.

Query compilations whose allocations exceed this limit will also terminate the server. See Code Generation for more information on query compilations. By default, maximum_memory is set to 90% of the physical memory on the host machine if no swap is enabled, and 100% of physical memory on the machine if swap is enabled.

maximum_table_memory

MemSQL will not allow writes to any table once the cumulative memory in use by all tables in MemSQL reaches maximum_table_memory (MemSQL will become read-only). SELECT and DELETE queries will still be allowed even once the limit is reached. UPDATE, INSERT , CREATE TABLE, ALTER TABLE, CREATE INDEX or DROP INDEX statements will fail with an error message once the limit has been reached. Query compilations are also disallowed once the limit is reached.

This setting is designed to allow SELECT queries to allocate temporary memory for sorting, hash group-by, and so on. The maximum_table_memory must be set to a value lower then maximum_memory. By default, maximum_table_memory is set to 90% of maximum_memory, which translates to about 80% of physical memory on the host machine.

If the maximum_table_memory limit has been reached, DELETE queries can still be executed to remove data from the table; however large DELETE queries may fail if the memory used by MemSQL reaches maximum_memory.

Caution should be taken as DELETE queries allocate extra memory to mark rows as deleted. For rowstore tables, this equates to roughly 40 + 8*number_of_indexes bytes per deleted row. For columnstore tables, the memory usage will be lower because of how rows are marked to be deleted (roughly num_rows_in_table/8 bytes if you delete a row in every segment file in the table).

If the table is narrow, such as containing a small number of int columns, DELETE queries will show up as a relatively large spike in memory usage compared to the size of the table.

The memory for a deleted row is reclaimed after the transaction commits and the memory is freed asynchronously by the garbage collector. For more information, see the DELETE reference topic.

Replicating databases will pause if memory use reaches maximum_table_memory while replicating data. If memory becomes available again - say some data is deleted - replication will automatically continue.

buffer_manager_cached_memory

Warning

This value cannot be modified by users.

MemSQL caches memory (allocated by the Buffer Manager) that is not in use, instead of returning it to the system. The buffer_manager_cached_memory variable keeps a track of this cached memory. Let’s say a query consumes XXX GBs of memory for execution. Once the query execution completes, instead of being returned to the system (Linux), this memory is stored as the Buffer Manager cached memory. Thereafter, if some other query needs memory, it will get that memory from the cache. This process avoids system call/contention over access to memory, which can be substantial. The size of memory reserved for each node in the cache can be viewed in the output of the SHOW STATUS EXTENDED command for each node. For example,

SHOW STATUS EXTENDED LIKE 'buffer_manager_cached%';
****
+------------------------------+---------+
| Variable_name                |  Value  |
+------------------------------+---------+
| Buffer_manager_cached_memory | 80.0 MB |
+------------------------------+---------+

buffer_manager_cached_memory has following limitations:

  • buffer_manager_cached_memory is capped at 25% of the maximum_memory.
  • Buffers will only be cached if the total_server_memory is less than 90% of the maximum_memory

For more information on buffer_manager_cached_memory and total_server_memory, see the Summary Variables topic.

Maximum Table Memory Reached

If you see the error message below, the leaf at 'leafhost':leafport has reached maximum_table_memory, which is the maximum cumulative memory in use by all tables on a node. MemSQL will not allow writes to any table once maximum_table_memory is reached (MemSQL will become read-only).

Leaf Error (leafhost:leafport): Memory usage by MemSQL for tables (##### MB) has reached the value of 'maximum_table_memory' global variable (##### MB). This query cannot be executed.

You can resolve this issue with the steps below.

Check for Data Skew

First, determine which nodes are reaching their memory capacity. Do this by checking for data skew across your nodes. If skew is present (significantly higher memory consumption on specific nodes), you should re-evaluate your shard keys. If no skew is present, you should increase cluster memory or delete data. There are two ways to check for skew:

  1. Execute the following query to view per-table memory consumption and row counts in your MemSQL cluster for tables with more than 10000 rows:

    SELECT
        DATABASE_NAME,
        TABLE_NAME,
        MIN(ROWS),
        MAX(ROWS),
        FLOOR(AVG(ROWS)) AS avg_rows,
        ROUND(STDDEV(ROWS)/AVG(ROWS),3) AS row_skew,
        MIN(MEMORY_USE),
        MAX(MEMORY_USE),
        SUM(MEMORY_USE)/(1024*1024) AS total_memory_mb,
        FLOOR(AVG(MEMORY_USE)) AS avg_memory,
        ROUND(STDDEV(MEMORY_USE)/AVG(MEMORY_USE),3) AS memory_skew
    FROM INFORMATION_SCHEMA.TABLE_STATISTICS
    GROUP BY 1, 2
    HAVING SUM(ROWS) > 10000;
    
  2. With any version of MemSQL, you can manually compare maximum_table_memory and Alloc_table_memory (memory used by tables) on each node:

    mysql> SHOW VARIABLES LIKE "maximum_%";
    +----------------------+-------+
    | Variable_name        | Value |
    +----------------------+-------+
    | maximum_memory       | 48291 |
    | maximum_table_memory | 43462 |
    +----------------------+-------+
    
    mysql> SHOW STATUS EXTENDED LIKE "Alloc_table_memory";
    +--------------------+----------+
    | Variable_name      | Value    |
    +--------------------+----------+
    | Alloc_table_memory | 43462 MB |
    +--------------------+----------+
    1 row in set (0.01 sec)
    

If memory consumption is significantly higher on specific nodes, skew is present. If memory consumption is fairly uniform across leaves, no skew is present.

More Examples of Finding Data Skew

  • Show skew across all partitions for a single table. Fill in <table_name>.

    SELECT database_name,
           table_name,
           ordinal AS PARTITION_ID,
           rows,
           memory_use
    FROM   information_schema.table_statistics
    WHERE  table_name = '<table_name>';
    
  • Show per-partition skew for columns in a group by. Helps identify a potential bottleneck after a subquery with a group by. If skew factor is high, that indicates that some groups have very high cardinality and some have very low cardinality and can lead to long single-threaded operations. Fill in <table_name>.

    SELECT   ROUND(STDDEV(c)/AVG(c),3)*100 AS group_skew,
             PARTITION_ID()
    FROM     (
                      SELECT   col1,
                               col2,
                               ...,
                               coln,
                               count(*)
                      FROM     <table_name>
                      GROUP BY 1, 2) sub
    GROUP BY PARTITION_ID();
    
  • Show data distribution for a table if you were to reshard on another column or set of columns. Fill in <table_name> and <potential_shard_key>.

    SELECT   WITH(leaf_pushdown=true) SUM(c) rows,
             PARTITION_ID()                  partition_id
    FROM     (
                      SELECT   count(*) c
                      FROM     <table_name>
                      GROUP BY <potential_shard_key>) reshuffle
    GROUP BY PARTITION_ID();
    

If Skew is Present

MemSQL distributes data based on the SHARD KEY specified in a table’s schema. If no SHARD KEY is explicitly specified, the PRIMARY KEY is used as the default SHARD KEY. If the SHARD KEY shards on a low cardinality column, data will accumulate on certain nodes. To resolve this, you should export your data, modify your schema, and then reload your data. To export your data, see Exporting Data From MemSQL. After you backup your data, you can DROP TABLE low_cardinality_shard_key_table, and then CREATE TABLE with a higher cardinality SHARD KEY. When you reload the data, MemSQL will automatically shard your data based on the new SHARD KEY.

Example:

$ mysqldump -h 127.0.0.1 -u root my_db low_cardinality_shard_key_table --no-create-info > databackup.sql

memsql> DROP TABLE low_cardinality_shard_key_table;

memsql> source updated_shard_key_schema.sql;

If No Skew is Present

If nodes are reaching maximum_table_memory without skew, the cluster itself is nearing maximum memory capacity. To resolve this, you need to add capacity or delete data. To add capacity, see Administering a Cluster. If additional leaf nodes are not available, use the instructions below to delete data in batches. DELETE queries use 50 bytes of memory per record deleted until the DELETE commits, so the maximum_table_memory needs to be temporarily increased on all nodes during the DELETE.

On each leaf node in your cluster, do the following:

  1. Pause write workloads from an aggregator or your application.

  2. Check current maximum_memory and maximum_table_memory.

    memsql> select @@global.maximum_memory
    | @@global.maximum_memory |
    +-------------------------+
    |                   65536 |
    +-------------------------+
    1 row in set (0.00 sec)
    
    memsql> select @@global.maximum_table_memory
    +-------------------------------+
    | @@global.maximum_table_memory |
    +-------------------------------+
    |                         57927 |
    +-------------------------------+
    1 row in set (0.00 sec)
    
  3. Increase maximum_table_memory to 95% of maximum_memory. The default value is 90% of maximum_memory.

    memsql> set @@global.maximum_table_memory=62259;
    Query OK, 0 rows affected (0.00 sec)
    
  4. Execute small batches of DELETE queries. Limit to 100,000 records or less to reduce memory used until commit.

    memsql> DELETE FROM mytable limit 100000;
    Query OK, 0 rows affected (1.64 sec)
    
  5. Restore maximum_table_memory to original value.

    memsql> set @@global.maximum_table_memory=57927;
    Query OK, 0 rows affected (0.00 sec)
    

Linux Out of Memory Behavior

Linux will kill processes using a lot of memory when the amount of free memory on the system is too low. MemSQL is often the target process killed. To check if MemSQL was killed by Linux as a result of the system running out of memory, run:

$ dmesg | grep -i "out of memory"

If the server has rebooted since MemSQL was killed, you can find the logs from the last boot by running:

$ cat /var/log/kern.log

Linux can be configured to disable this behaviour by changing the setting of vm.overcommit_memory or vm.overcommit_ratio, but this is not recommended. It is safer to configure MemSQL to use less memory by setting maximum_memory to a lower value.

Freeing Table Memory

Table memory can only be freed back to the system when large chunks of data is deleted. Small deletes usually do not free the memory back to the system, as the memory is still available to MemSQL for additional writes. This is due to the structuring of the in-memory indexes where the chunks can only be freed if they are completely empty, that is when every row in the chunk has been deleted. It is difficult to quantify the amount of data that needs to be deleted since it depends on number of factors like the number of rows in one of these chunks depends on the table schema, whether or not a chunk gets completely emptied depends on the exact pattern of inserts and deletes, and so on. This enables optimization for improved efficiency and performance.

Currently, there is no method or allocator to track the ‘real memory use’ versus ‘free memory but still allocated to MemSQL’. However, table memory can be freed by running commands like OPTIMIZE TABLE, TRUNCATE, DROP TABLE, and DELETE. For example, the OPTIMIZE TABLE..FULL command moves rows from partially-full chunks of memory in order to consolidate the data into full and empty chunks, thus allowing the newly-created empty chunks to be freed back to the system. The ‘used by MemSQL’ memory and the free system memory is as accurate as possible, and can be referred to know the amount of free memory available on the hosts. However, it has its own limitations like it uses a lot of CPU and is a time consuming process.