You are viewing an older version of this section. View current production version.
ISSUE: Inconsistent Query Run Times
If you receive very different response times when you run the same query (or groups of queries on the same leaf) repeatedly, Transparent Huge Pages (THP) may be the cause.
Linux organizes RAM into pages that are ordinarily 4KB. With THP, Linux can instead use 2MB pages or larger. But THP transparently re-organizes memory used by a process inside the kernel, merging some small pages to huge pages, and splitting some huge pages to small pages. This background process holds locks on the memory manager, and can prevent your process from accessing memory.
You can check for THP by running
cat /sys/kernel/mm/*transparent_hugepage/enabled. If the [brackets] are around [always], you should disable THP on all nodes in your cluster. Fully remedying the issue may require restarting MemSQL, or fully restarting all of the servers. Cluster operators should disable THP before launching in production.
Note: Having THP enabled may also be the cause of high system CPU (sometimes called red CPU).
How to Disable THP
MemSQL recommends disabling transparent huge pages (THP) at boot time on all nodes (master aggregator, child aggregators, and leaves) in the cluster. Not doing so may result in inconsistent query run times.
To disable THP, add the following lines to the end of
rc.local before the exit line (if present), and reboot the host machines.
For Debian-based distributions, add the following to
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
For RedHat/CentOS distributions, add the following to
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo 0 > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
khugepaged/defrag option will be
0 on newer Linux versions (e.g. CentOS 7+), but
no on older versions (e.g. CentOS 6). You can check which your system uses by running
cat /sys/kernel/mm/*transparent_hugepage/khugepaged/defrag and keep only the matching setting. For example, if you see
0 update the line with
echo 0. If you see
no, update the line with
ISSUE: Long Running Queries Blocking DDL Operations and Workload
MemSQL supports execution of online DDL statements, which means that you can perform read and write operations while the DDL command is being executed on a table. The execution of a DDL command does not begin until all the in-progress DML queries running on the table finish. As soon as the in-progress read and write operations complete and the DDL command begins executing, new reads and writes proceed as normal. This blocking period usually lasts on the order of milliseconds. If there are long running DML queries in your workload, it can extend the blocking period because the DDL must wait until those queries complete. The DDL waits and blocks until the
default_distributed_ddl_timeout is reached (default value is 180000 milliseconds or 3 minutes), after which it throws a timeout error. If you frequently run DDL statements and have a lot of long-running queries, then it negatively impacts the workload since the DDL operation blocks other queries from starting while it waits for completion of long-running queries.
Increasing the timeout duration for a DDL operation is not a recommended option since it delays the start of the operation. While the DDL operation is delayed from starting, it will continue to block new queries from executing either. Therefore, raising the timeout can cause longer delays in regular queries in your workload.
Optimize your workload. If you have long running queries, then you can optimize your current workload by running frequent maintenance and backups, or by using a different workload. For example, run the analytical queries in your workload during the lowest traffic time. Additionally, you can use cluster replication to replicate the data to a secondary read-only cluster, and then run all long running or analytical queries on the secondary cluster. This drastically reduces the workload on the primary cluster and improves workload performance.
TRUNCATEcommand instead of the
DELETEcommand to remove rows from a table.
TRUNCATEis the preferred method if you want to remove all rows of a large table since it instantly frees up the memory. On the contrary,
DELETEdoes not instantly drop the rows being deleted from the memory; it marks the rows for deletion, while a background job runs the garbage collector periodically to clear out the deleted rows. Each marked row consumes memory, which may result in memory allocation issues for a large table.
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Rowlock timeout errors can occur in an application because of contention with transactions or uncommitted transactions. Refer to the following sections to resolve lock wait timeout errors.
Updates and deletes in SingleStore are row locking operations. If a row is currently locked by query
q1 running in transaction
t1, a second query
q2 in transaction
t2 that operates on the same row will be blocked until
Updates and deletes on a columnstore table lock the entire table if the count of rows being updated is greater than
columnstore_disk_insert_threshold. See Locking in Columnstores for more information.
The “Lock wait timeout exceeded; try restarting transaction” error will occur when a query cannot proceed because it is blocked by a rowlock.
Deadlock between Transactions
Typically, a deadlock happens when two or more transactions are writing to the same rows, but in a different order. For example, consider two concurrently running queries
q2 in different transactions, where both
q2 want to write to rows
r2. If the query
q1 wants to write to rows
r1 and then
r2, but the query
q2 wants to write to row
r2 first and then
r1, they will deadlock.
A single transaction that runs multiple statements should not deadlock itself, because queries are executed in the order they are defined.
To resolve contention when two or more transactions are writing to the same row (in the same order), add indexes on the columns being updated. Adding indexes to the columns ensures that the row locks are taken in the same order that the write statements are specified in each transaction. Note: This solution will not resolve a deadlock (defined previously).
For example, if the following query generates a lock wait timeout exceeded error:
DELETE FROM bucket WHERE meta_uid_id = ?
Add an index on meta_uid_id.
Consider the following
INSERT ... ON DUPLICATE KEY UPDATEstatement:
INSERT INTO allotments (DevID, DevName, Orders) SELECT * FROM deprAllot ON DUPLICATE KEY UPDATE DevID = VALUES(DevID), Orders = VALUES (Orders);
To reduce the chance of a lock wait timeout exceeded error, add individual indexes to the
You can decrease the
lock_wait_timeoutvalue to more quickly fail a query that is blocked by a lock.
To resolve a deadlock (defined above) you should change the flow of your application logic execute queries that write to the same rows in the same order.
You can also disable
multistatement_transactionsto prevent deadlocks is not recommended because it can lead to data loss or other serious issues.
show variables like '%multistatement_transactions%';
Shows whether multistatement transactions are on, and
SET GLOBAL multistatement_transactions = off;
turns it off.
multistatement_transactionsturns off cross-node transactions for write queries. Hence, write queries that span across multiple nodes, commit locally when the processing finishes on the node.
sdb-admin update-config --key multistatement_transactions --value off --set-global --memsql-id <aggregator_id>
memsql-ops memsql-update-config --key multistatement_transactions --value off --set-global <aggregator_id>
By turning off multi-statement transactions, write queries that touch multiple nodes will commit locally when the processing finishes on the node. For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction.
Open transactions hold the locks on rows affected by the transaction until they are committed or rolled back, and any other write query modifying the same rows has to wait for the open transaction to release the locks. If the query has to wait for more than the
lock_wait_timeout (default 60 seconds), it fails. This happens most often when the open transaction is idle and unnecessarily holding the locks.
- To resolve the issue, identify the idle transaction and kill its connection.
- To prevent the issue from recurring, ensure that all the transactions are committed or rolled back.
Resolving the current locking issue
If a query has failed because it waited long enough to exceed the
lock_wait_timeout, identify the transaction that is causing the timeout, and kill its connection. Killing the connection rolls back the uncommitted writes of the open transaction. For example, a write operation is performed in the following transaction, but it is not committed.
START TRANSACTION; UPDATE cust SET ORDERS = (ORDERS + 1) WHERE ID > 7680;
In another connection start a transaction, and run a query that tries to update the same set of rows as the transaction above.
START TRANSACTION; UPDATE cust SET ORDER_DATE = DATE_ADD(ORDER_DATE, INTERVAL 1 DAY) WHERE ID > 7680; **** ERROR 1205 (HY000): Leaf Error (192.168.3.152:3307): Lock wait timeout exceeded; try restarting transaction. Lock owned by connection id 76, query `open idle transaction`
To get the list of all the running transactions, query the
INFORMATION_SCHEMA.PROCESSLIST table. This management view table contains the processlist information for all nodes in the cluster, and you can join this table with the
INFORMATION_SCHEMA.MV_NODES table to identify the nodes on which each process is running.
SELECT TYPE, IP_ADDR, PORT, mvp.ID, USER, HOST, COMMAND, TIME, TRANSACTION_STATE, ROW_LOCKS_HELD FROM INFORMATION_SCHEMA.MV_PROCESSLIST mvp JOIN INFORMATION_SCHEMA.MV_NODES mvn ON mvp.NODE_ID = mvn.ID WHERE TRANSACTION_STATE = "open" AND COMMAND = "Sleep" AND TIME > 60 ORDER BY TYPE; **** +------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+ | TYPE | IP_ADDR | PORT | ID | USER | HOST | COMMAND | TIME | TRANSACTION_STATE | ROW_LOCKS_HELD | +------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+ | CA | 192.168.1.146 | 3306 | 143 | root | localhost:58322 | Sleep | 385 | open | 0 | | LEAF | 192.168.3.152 | 3307 | 76 | distributed | 192.168.1.146:9555 | Sleep | 385 | open | 5168 | | LEAF | 192.168.0.24 | 3307 | 125 | distributed | 192.168.1.146:7371 | Sleep | 385 | open | 5047 | | LEAF | 192.168.0.24 | 3307 | 104 | distributed | 192.168.1.146:60874 | Sleep | 385 | open | 5001 | | LEAF | 192.168.0.24 | 3307 | 115 | distributed | 192.168.1.146:63178 | Sleep | 385 | open | 5044 | | LEAF | 192.168.3.242 | 3307 | 55 | distributed | 192.168.1.146:10118 | Sleep | 385 | open | 5129 | | LEAF | 192.168.3.242 | 3307 | 77 | distributed | 192.168.1.146:21382 | Sleep | 385 | open | 5168 | | LEAF | 192.168.3.242 | 3307 | 81 | distributed | 192.168.1.146:26758 | Sleep | 385 | open | 4998 | | LEAF | 192.168.3.242 | 3307 | 122 | distributed | 192.168.1.146:55942 | Sleep | 385 | open | 5021 | | LEAF | 192.168.0.24 | 3307 | 75 | distributed | 192.168.1.146:40650 | Sleep | 385 | open | 5015 | | LEAF | 192.168.1.100 | 3307 | 131 | distributed | 192.168.1.146:23998 | Sleep | 385 | open | 5013 | | LEAF | 192.168.3.152 | 3307 | 78 | distributed | 192.168.1.146:11603 | Sleep | 385 | open | 5124 | | LEAF | 192.168.3.152 | 3307 | 83 | distributed | 192.168.1.146:21331 | Sleep | 385 | open | 5018 | | LEAF | 192.168.3.152 | 3307 | 120 | distributed | 192.168.1.146:39763 | Sleep | 385 | open | 5092 | | LEAF | 192.168.1.100 | 3307 | 79 | distributed | 192.168.1.146:1726 | Sleep | 385 | open | 5090 | | LEAF | 192.168.1.100 | 3307 | 82 | distributed | 192.168.1.146:6078 | Sleep | 385 | open | 5065 | | LEAF | 192.168.1.100 | 3307 | 114 | distributed | 192.168.1.146:21438 | Sleep | 385 | open | 5160 | | MA | 192.168.0.231 | 3306 | 490 | root | localhost:32737 | Sleep | 73 | open | 0 | +------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
In the result, each aggregator and each partition on each leaf has one idle open transaction which is holding row locks. The locks are only held on the leaves, because that is where the data is stored. Look for an aggregator process with the same run time as the leaf processes that are holding locks. In the result above, it is the process with connection ID
143, which is running on the child aggregator on host
192.168.1.146:3306. To resolve the current issue, connect to this aggregator and kill the connection with ID
KILL CONNECTION 143;
Alternatively, you can kill the open connections that have been idle for more than
Note: Although the error message indicates that the lock is owned by the connection with ID
76, it is not the connection that needs to be killed. It is the connection for one of the distributed leaf processes, which holds the lock that our query was waiting on. Instead, we need to kill the connection for the aggregator process, which will roll back the entire transaction.
The original connection on the child aggregator that was updating
ORDERS is now closed, and the transaction is rolled back. If you try to use that session again, it automatically re-establishes a new connection with a new ID.
SELECT COUNT(*) FROM cust; **** ERROR 2013 (HY000): Lost connection to MySQL server during query
SELECT COUNT(*) FROM cust; **** ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 160 Current database: db1 +----------+ | COUNT(*) | +----------+ | 390955 | +----------+
Preventing the Issue from Recurring
To prevent the issue from recurring, inspect the application code and ensure that all the transactions are either committed or rolled back, including the transactions in exception handling. Additionally, all manually run transactions must be committed or rolled back as well.
A common cause of this error is if the transaction is executed as part of a stored procedure, but exception handling causes the final commit to get skipped. To prevent this issue, add rollbacks for the transaction in the exception handling section of the stored procedure. For more information, see Transactions in Stored Procedures.
The impact of this error can be mitigated by setting a session timeout for the client, in order to automatically kill the idle transactions. When a client ends a session, the database rolls back the uncommitted transactions and closes the connection on its end. If the session timeout value is set lower than the lock wait timeout (default 60 seconds), then idle connections will not cause lock wait timeouts to be breached.
Note: Setting a query timeout in Resource Governance will not affect idle transactions. The query timeout limit only applies to executing queries.
ERROR 1706 (HY000): Feature ‘Multi-table UPDATE/DELETE with a reference table as target table’ is not supported by MemSQL
Updating a reference table based on a join with a sharded table returns an error. This is because reference tables are only updatable on the master aggregator while joins against sharded tables must run on the leaves, so the query cannot run like normal updates.
Use the same table types that use both the target as well as joined table as sharded tables or as reference tables.
UPSERTwith a reference table that has a primary key. For example, if
t1is a reference table with a primary key, then the following query runs successfully:
INSERT INTO t1 SELECT t1.* FROM t1 JOIN t2 ON t1.b = t2.b ON DUPLICATE KEY UPDATE t1.b.
For more details, refer to the UPDATE topic.
- Turning off multi-statement transactions makes per-partition components of a distributed transaction commit independently without waiting for other partitions to finish and potentially trigger rollback. This removes the common row lock deadlock (transaction A waiting for transaction B on partition 1, but B waiting for A on partition 2) at the cost of potentially letting failed queries commit partial results.
- If the values inserted are dependent on multiple statements executing concurrently, then inconsistent data may get inserted.
- In case of failure, it is not possible to rollback the entire multi-upsert inside one transaction.
ERROR 1706 (HY000): Leaf Error (127.0.0.1:3307): Feature ‘INSERT IGNORE … ON DUPLICATE KEY UPDATE’ is not supported by MemSQL
When you use the
IGNORE modifier in a
INSERT statement, MemSQL ignores records with duplicate key entries and only inserts records with unique key values. The
ON DUPLICATE KEY UPDATE clause, instead of ignoring the insert, performs an update on the row with the duplicate key value. Using both of them in same query leads to a dilemma: whether to ignore the insert or update the record. To prevent any ambiguity in data, MemSQL returns an error, instead of choosing one operation over another.
Either use the
IGNORE modifier or
ON DUPLICATE KEY UPDATE clause in
INSERT statements, but not both. For more information, see INSERT