You are viewing an older version of this section. View current production version.
Memory Management
MemSQL Helios does not support setting the engine variables that are discussed in this topic.
This page discusses common memory management topics. It includes the following sections:
- Configuring Memory Limits
- Maximum Table Memory Reached
- Linux Out of Memory Behavior
- Freeing Table Memory
Configuring Memory Limits
MemSQL has two engine variables that control its memory usage. Both are measured in megabytes.
maximum_memory
MemSQL will not allocate more than maximum_memory
megabytes of memory. If a SELECT
query’s memory allocations put the memory usage by MemSQL over maximum_memory
, query execution stops and an error message is generated.
Query compilations whose allocations exceed this limit will also terminate the server. See Code Generation for more information on query compilations. By default, maximum_memory
is set to 90% of the physical memory on the host machine if no swap is enabled, and 100% of physical memory on the machine if swap is enabled.
maximum_table_memory
MemSQL will not allow writes to any table once the cumulative memory in use by all tables in MemSQL reaches maximum_table_memory
(MemSQL will become read-only). SELECT
and DELETE
queries will still be allowed even once the limit is reached. UPDATE
, INSERT
, CREATE TABLE
, ALTER TABLE
, CREATE INDEX
or DROP INDEX
statements will fail with an error message once the limit has been reached. Query compilations are also disallowed once the limit is reached.
This setting is designed to allow SELECT queries to allocate temporary memory for sorting, hash group-by, and so on. The maximum_table_memory
must be set to a value lower then maximum_memory
. By default, maximum_table_memory
is set to 90% of maximum_memory
, which translates to about 80% of physical memory on the host machine.
If the maximum_table_memory
limit has been reached, DELETE
queries can still be executed to remove data from the table; however large DELETE
queries may fail if the memory used by MemSQL reaches maximum_memory
.
Caution should be taken as DELETE
queries allocate extra memory to mark rows as deleted. For rowstore tables, this equates to roughly 40 + 8*number_of_indexes bytes per deleted row. For columnstore tables, the memory usage will be lower because of how rows are marked to be deleted (roughly num_rows_in_table/8 bytes if you delete a row in every segment file in the table).
If the table is narrow, such as containing a small number of int columns, DELETE
queries will show up as a relatively large spike in memory usage compared to the size of the table.
The memory for a deleted row is reclaimed after the transaction commits and the memory is freed asynchronously by the garbage collector. For more information, see the DELETE reference topic.
Replicating databases will pause if memory use reaches maximum_table_memory
while replicating data. If memory becomes available again - say some data is deleted - replication will automatically continue.
buffer_manager_cached_memory
This value cannot be modified by users.
MemSQL caches memory (allocated by the Buffer Manager) that is not in use, instead of returning it to the system. The buffer_manager_cached_memory
variable keeps a track of this cached memory. Let’s say a query consumes XXX GBs of memory for execution. Once the query execution completes, instead of being returned to the system (Linux), this memory is stored as the Buffer Manager cached memory. Thereafter, if some other query needs memory, it will get that memory from the cache. This process avoids system call/contention over access to memory, which can be substantial. The size of memory reserved for each node in the cache can be viewed in the output of the SHOW STATUS EXTENDED command for each node. For example,
SHOW STATUS EXTENDED LIKE 'buffer_manager_cached%';
****
+------------------------------+---------+
| Variable_name | Value |
+------------------------------+---------+
| Buffer_manager_cached_memory | 80.0 MB |
+------------------------------+---------+
buffer_manager_cached_memory
has following limitations:
buffer_manager_cached_memory
is capped at 25% of themaximum_memory
.- Buffers will only be cached if the
total_server_memory
is less than 90% of themaximum_memory
For more information on buffer_manager_cached_memory
and total_server_memory
, see the Summary Variables topic.
Maximum Table Memory Reached
If you see the error message below, the leaf at 'leafhost':leafport
has reached maximum_table_memory
, which is the maximum cumulative memory in use by all tables on a node. MemSQL will not allow writes to any table once maximum_table_memory
is reached (MemSQL will become read-only).
Leaf Error (leafhost:leafport): Memory usage by MemSQL for tables (##### MB) has reached the value of 'maximum_table_memory' global variable (##### MB). This query cannot be executed.
You can resolve this issue with the steps below.
Check for Data Skew
First, determine which nodes are reaching their memory capacity. Do this by checking for data skew across your nodes. If skew is present (significantly higher memory consumption on specific nodes), you should re-evaluate your shard keys. If no skew is present, you should increase cluster memory or delete data. There are two ways to check for skew:
-
Execute the following query to view per-table memory consumption and row counts in your MemSQL cluster for tables with more than 10000 rows:
SELECT DATABASE_NAME, TABLE_NAME, MIN(ROWS), MAX(ROWS), FLOOR(AVG(ROWS)) AS avg_rows, ROUND(STDDEV(ROWS)/AVG(ROWS),3) AS row_skew, MIN(MEMORY_USE), MAX(MEMORY_USE), SUM(MEMORY_USE)/(1024*1024) AS total_memory_mb, FLOOR(AVG(MEMORY_USE)) AS avg_memory, ROUND(STDDEV(MEMORY_USE)/AVG(MEMORY_USE),3) AS memory_skew FROM INFORMATION_SCHEMA.TABLE_STATISTICS GROUP BY 1, 2 HAVING SUM(ROWS) > 10000;
-
With any version of MemSQL, you can manually compare
maximum_table_memory
andAlloc_table_memory
(memory used by tables) on each node:mysql> SHOW VARIABLES LIKE "maximum_%"; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | maximum_memory | 48291 | | maximum_table_memory | 43462 | +----------------------+-------+ mysql> SHOW STATUS EXTENDED LIKE "Alloc_table_memory"; +--------------------+----------+ | Variable_name | Value | +--------------------+----------+ | Alloc_table_memory | 43462 MB | +--------------------+----------+ 1 row in set (0.01 sec)
If memory consumption is significantly higher on specific nodes, skew is present. If memory consumption is fairly uniform across leaves, no skew is present.
More Examples of Finding Data Skew
-
Show skew across all partitions for a single table. Fill in
<table_name>
.SELECT database_name, table_name, ordinal AS PARTITION_ID, rows, memory_use FROM information_schema.table_statistics WHERE table_name = '<table_name>';
-
Show per-partition skew for columns in a group by. Helps identify a potential bottleneck after a subquery with a group by. If skew factor is high, that indicates that some groups have very high cardinality and some have very low cardinality and can lead to long single-threaded operations. Fill in
<table_name>
.SELECT ROUND(STDDEV(c)/AVG(c),3)*100 AS group_skew, PARTITION_ID() FROM ( SELECT col1, col2, ..., coln, count(*) FROM <table_name> GROUP BY 1, 2) sub GROUP BY PARTITION_ID();
-
Show data distribution for a table if you were to reshard on another column or set of columns. Fill in
<table_name>
and<potential_shard_key>
.SELECT WITH(leaf_pushdown=true) SUM(c) rows, PARTITION_ID() partition_id FROM ( SELECT count(*) c FROM <table_name> GROUP BY <potential_shard_key>) reshuffle GROUP BY PARTITION_ID();
If Skew is Present
MemSQL distributes data based on the SHARD KEY specified in a table’s schema. If no SHARD KEY is explicitly specified, the PRIMARY KEY is used as the default SHARD KEY. If the SHARD KEY shards on a low cardinality column, data will accumulate on certain nodes. To resolve this, you should export your data, modify your schema, and then reload your data. To export your data, see Exporting Data From MemSQL. After you backup your data, you can DROP TABLE low_cardinality_shard_key_table
, and then CREATE TABLE
with a higher cardinality SHARD KEY. When you reload the data, MemSQL will automatically shard your data based on the new SHARD KEY.
Example:
$ mysqldump -h 127.0.0.1 -u root my_db low_cardinality_shard_key_table --no-create-info > databackup.sql
memsql> DROP TABLE low_cardinality_shard_key_table;
memsql> source updated_shard_key_schema.sql;
If No Skew is Present
If nodes are reaching maximum_table_memory
without skew, the cluster itself is nearing maximum memory capacity. To resolve this, you need to add capacity or delete data. To add capacity, see Administering a Cluster. If additional leaf nodes are not available, use the instructions below to delete data in batches. DELETE
queries use 50 bytes of memory per record deleted until the DELETE
commits, so the maximum_table_memory
needs to be temporarily increased on all nodes during the DELETE
.
On each leaf node in your cluster, do the following:
-
Pause write workloads from an aggregator or your application.
-
Check current
maximum_memory
andmaximum_table_memory
.memsql> select @@global.maximum_memory | @@global.maximum_memory | +-------------------------+ | 65536 | +-------------------------+ 1 row in set (0.00 sec) memsql> select @@global.maximum_table_memory +-------------------------------+ | @@global.maximum_table_memory | +-------------------------------+ | 57927 | +-------------------------------+ 1 row in set (0.00 sec)
-
Increase
maximum_table_memory
to 95% ofmaximum_memory
. The default value is 90% ofmaximum_memory
.memsql> set @@global.maximum_table_memory=62259; Query OK, 0 rows affected (0.00 sec)
-
Execute small batches of DELETE queries. Limit to 100,000 records or less to reduce memory used until commit.
memsql> DELETE FROM mytable limit 100000; Query OK, 0 rows affected (1.64 sec)
-
Restore
maximum_table_memory
to original value.memsql> set @@global.maximum_table_memory=57927; Query OK, 0 rows affected (0.00 sec)
Linux Out of Memory Behavior
Linux will kill processes using a lot of memory when the amount of free memory on the system is too low. MemSQL is often the target process killed. To check if MemSQL was killed by Linux as a result of the system running out of memory, run:
$ dmesg | grep -i "out of memory"
If the server has rebooted since MemSQL was killed, you can find the logs from the last boot by running:
$ cat /var/log/kern.log
Linux can be configured to disable this behaviour by changing the setting of vm.overcommit_memory
or vm.overcommit_ratio
, but this is not recommended. It is safer to configure MemSQL to use less memory by setting maximum_memory
to a lower value.
Freeing Table Memory
Table memory can only be freed back to the system when large chunks of data is deleted. Small deletes usually do not free the memory back to the system, as the memory is still available to MemSQL for additional writes. This is due to the structuring of the in-memory indexes where the chunks can only be freed if they are completely empty, that is when every row in the chunk has been deleted. It is difficult to quantify the amount of data that needs to be deleted since it depends on number of factors like the number of rows in one of these chunks depends on the table schema, whether or not a chunk gets completely emptied depends on the exact pattern of inserts and deletes, and so on. This enables optimization for improved efficiency and performance.
Currently, there is no method or allocator to track the ‘real memory use’ versus ‘free memory but still allocated to MemSQL’. However, table memory can be freed by running commands like OPTIMIZE TABLE, TRUNCATE, DROP TABLE, and DELETE.
For example, the OPTIMIZE TABLE..FULL
command moves rows from partially-full chunks of memory in order to consolidate the data into full and empty chunks, thus allowing the newly-created empty chunks to be freed back to the system. The ‘used by MemSQL’ memory and the free system memory is as accurate as possible, and can be referred to know the amount of free memory available on the hosts. However, it has its own limitations like it uses a lot of CPU and is a time consuming process.