Assess the Situation
This guide is designed to help resolve common failure scenarios that one may encounter during cluster operation.
The most important step in resolving any cluster failure is to assess your cluster and determine:
-
Are there any network issues that are preventing the cluster from operating properly?
-
Are there any recent hardware issues?
-
Are any of the drives suddenly experiencing issues?
-
Is your cluster utilizing high availability, and, if so, which replication mode is it using (redundancy-1 or redundancy-2)?
Knowing the answers to these questions will help you decide which of the following sections will help you resolve your issue(s) and whether any data loss will be incurred.
Tools: Leaf Node Failures
Replace a Failed Leaf Node in a Redundancy-1 Cluster
Review the size of your MemSQL data directory and confirm that at least twice the amount of disk space used by the leaf node’s data directory is available. This space is required to hold a copy of the leaf node’s data directory. Another drive, including an external drive, can also be used to hold a copy of the leaf node’s data.
A leaf node’s data typically resides in /var/lib/memsql/<node-directory>/data
by default. Note that the <node-directory>
in MemSQL v7.1 is a hash, and that you will need to look within this directory to determine if it’s the appropriate node. You can check the ID matches the memsql-id
within the node’s data
directory (e.g. cat ./data/memsql_id
).
This section covers the replacement of a failed leaf node within a redundancy-1 cluster. In this example, it is assumed that data can be recovered from the failed leaf node, which will then be restored.
-
Determine if the leaf node in question is still running within the cluster. If it is, first stop it, and then remove the “leaf” role from the node.
sdb-admin stop-node --memsql-id <node-ID>
sdb-admin remove-leaf --force --memsql-id <node-ID>
-
Assuming that the data is recoverable from the failed leaf node, preserve it by changing to the node directory (on the corresponding host) and compressing it to another location, such as the
/tmp
directory.cd /path/to/memsql/<node-directory>
sudo tar -zcvf /tmp/data.tgz ./data/
-
As the leaf node’s data has been preserved, delete the leaf node from the cluster.
sdb-admin delete-node --memsql-id <node-ID> --skip-remove
-
Create a new node to store this data in. After the node’s been created, stop it so that you can manipulate the node’s data directory.
sdb-admin create-node --host <node's-host-IP> --port <node's-host-port> --password <node-password>
sdb-admin stop-node --memsql-id <node-ID>
-
Navigate to the newly created node’s directory and remove the data directory within. Then, extract the data from the
/tmp
directory to this node’sdata
directory.cd /var/lib/memsql/<new-node-directory>
sudo rm -r ./data
sudo tar -zxvf /tmp/data.tgz
-
Once the extraction is finished, update the ownership and permissions on the directory’s files.
sudo chown memsql:memsql -R ./data
-
Start the new node. The MemSQL ID will now be that of the former node, which you can confirm with the following command.
sdb-admin start-node --all
-
Run the following SQL command on the Master Aggregator host to add this leaf node to the cluster.
ADD LEAF <user>:'<password>'@'<node's-host-IP>':<port>;
Replace a Failed Leaf Node in a Redundancy-2 Cluster
Review the size of your MemSQL data directory and determine if you have enough disk space remaining when you proceed with the recovery process.
This section details how to recover a failed leaf node in a high availability cluster.
Clusters using Async or Sync replication could potentially present different errors, so knowing which replication option that is being used will be key to recovering a failed cluster.
By default, all MemSQL v7.0 or later clusters will run with high availability using sync replication. Sync replication ensures that the shared data housed within the primary and secondary leaf nodes is always in sync.
Async replication treats data differently and, as its first priority, allows the cluster to continue to run. While data will eventually be synced between the leaf nodes, it is not a requirement that it happen at the time of the transaction. Refer to Replication and Durability for more information.
Sometimes data can become out of sync when one or more sizable queries are run. If failover occurs during one of these queries, MemSQL will recognize that it’s failing over from a leaf node that contains newer data. As the asynchronous replication of data on the secondary leaf node was not up to date when the failover occurred, there will be an unavoidable amount of data loss.
As a result, the cluster will stop performing transactions and will throw an Asynchronous replication with a “FailoverLeaf: Couldn’t failover from node." error. This is intentional, as it affords the cluster administrator to take manual control of the cluster to assess what has occurred.
To acknowledge the potential data loss and manually move the data to the secondary leaf node that is now out of sync, use a SQL editor to run the REBALANCE PARTITIONS command on each database. This will prevent any potential data loss from the leaf nodes being out of sync.
Alert: If REBALANCE PARTITIONS
is run before attempting to retain or fix the leaf/partition, the data will be lost and will no longer be recoverable.
REBALANCE PARTITIONS ON db_name;
You may also use the bash script in the One Leaf Node Fails section below to rebalance the partitions on all nodes and restore redundancy on all databases.
One Leaf Node Fails
Reintroduce the Leaf Node
The section details how to reintroduce a failed leaf node in a Redundancy-2 cluster. Reintroducing the leaf node is the simplest solution as MemSQL will automatically reattach the leaf node once it’s back online.
If the failed leaf node was the partition master, those partitions will be promoted to its pair. You may then reintroduce the failed leaf node to the cluster, or add a new leaf node to replace it.
Replace the Leaf Node
The section details how to replace a failed leaf node in a Redundancy-2 cluster with a replacement leaf node from a different host.
If the failed leaf node was the partition master, those partitions will be promoted to its pair. You may then reintroduce the failed leaf node to the cluster, or add a new leaf node to replace it.
-
If the host of the failed leaf node is still available, note either its availability group, or the availability group of its pair.
-
If the host that the failed leaf node resides on has also failed, the leaf node must be removed from the cluster. Determine if the failed leaf node is still shown.
sdb-admin list-nodes
If so, remove it from the cluster.
sdb-admin remove-leaf --memsql-id <node-ID>
If not, skip removing it from the cluster and just remove the failed host.
sdb-admin delete-node --memsql-id <node-ID> --skip-remove
sdb-toolbox-config unregister-host --host <host-IP>
-
The leaf node may still be visible to the cluster, but Toolbox will no longer recognize it. To confirm, run the following SQL command on the Master Aggregator host and look for the IP and port of the failed leaf node.
SHOW LEAVES;
-
If the leaf node persists within the cluster, remove it from the Master Aggregator.
REMOVE LEAF 'IP OF LEAF':PORT;.
-
Using SingleStore DB Toolbox, add a new replacement host to the cluster.
sdb-toolbox-config register-host --host <host-IP> -i <SSH-identity-file>
-
Deploy SingleStore DB to this host.
sdb-deploy install --host <host-IP>
-
Create a replacement node, assign it a “leaf” role, and add it to the availability group you noted earlier.
sdb-admin create-node --host <host-IP> --password <secure-password> --port <port>
sdb-admin add-leaf --memsql-id <mew-node-ID> --availability-group <1 or 2>
-
Rebalance the partitions on all nodes and restore redundancy on all databases.
REBALANCE PARTITIONS
The following script may also be used to rebalance partitions on all databases:
#!/bin/bash for DB in `memsql -u root -h 127.0.0.1 --batch --skip-pager --skip-column-names --execute="SHOW DATABASES" | grep -vE "^(memsql|information_schema|sharding|cluster)$"` do echo "Rebalancing partitions on DB $DB" memsql -u root -h 127.0.0.1 --batch --execute "REBALANCE PARTITIONS ON $DB" done
A Pair of Leaf Nodes Fail
When a pair of leaf nodes fail, their partitions will no longer have any remaining instances, which effectively takes these partitions offline for both reads and writes.
If either of the leaf nodes’ hosts, or a leaf node’s data on a host, is recoverable, a failed leaf node can be reintroduced and reattached to its partitions by following the steps in the Replace a Failed Leaf Node in a Redundancy-1 Cluster section. After following those steps, the partitions will be back online for both reads and writes.
If neither leaf node’s host is recoverable, then data loss has occurred. You must now add replacement leaf nodes and run REBALANCE PARTITIONS ... FORCE
to create new (empty) replacement partitions. This can be by following the steps in the Replace a Leaf Node in a Redundancy-2 Cluster section.
Many Unpaired Leaf Nodes Fail
So long as two paired leaf nodes have not failed, all partitions are still available for reads and writes.
In certain circumstances, all of the leaf nodes in one availability group can fail, but no data loss will be incurred so long as redundancy is restored before another leaf node fails in the remaining availability group.
Many Leaf Nodes Fail, Some of them Paired
When both leaf nodes in a pair fail, every partition that is hosted by these two leaf nodes will be offline to reads and writes.
When one leaf node of a pair fails, the partitions of its pair will remain online for reads and writes.
Offline partitions should be handled using the method detailed in the A Pair of Leaf Nodes Fail section. However, as both leaf nodes are unrecoverable, RESTORE REDUNDANCY or REBALANCE PARTITIONS should only be run after all partitions have either been recovered or abandoned as lost data. Aggregator Failures
Tools: Aggregator Failures
Master Aggregator Fails
If the Master Aggregator node fails and is recoverable, restart the Master Aggregator.
If it is not recoverable, a Child Aggregator must be promoted to the role of Master Aggregator by running either AGGREGATOR SET AS MASTER in SQL or sdb-admin bootstrap-aggregator on the command line.
One Child Aggregator Fails
If a Child Aggregator is recoverable, restart the Child Aggregator.
If it is not recoverable, deploy a new MemSQL aggregator by creating a new node and adding it to the cluster as a Child Aggregator.
Many Aggregators Fail
If more than one aggregator fails, each aggregator can either be reintroduced or removed, created, and then replaced one by one.
End: SingleStore Tools
Ops: Leaf Node Failures
Leaf Failures in a Redundancy-1 Cluster
When a leaf dies in a redundancy-1 cluster, all partitions hosted on that leaf will be offline to reads and writes. There are three potential resolutions: reintroduce the leaf, introduce a replacement leaf, or recreate the lost partitions on the remaining leaves.
If the leaf machine is recoverable, MemSQL will automatically reattach the leaf as soon as it is back online.
If the leaf machine is unrecoverable but you can still access its data, then you can introduce a replacement following the guide on how to Replace a Dead Leaf in a Redundancy-1 Cluster.
If the leaf machine and its data is unrecoverable, and you wish to introduce a replacement, you can follow the guide on how to Replace a Dead Leaf in a Redundancy-1 Cluster, steps 1 to 3. Then run REBALANCE PARTITIONS … FORCE. MemSQL will create new (empty) partitions to replace those that were lost.
Finally, if you wish to recreate the lost partitions on the remaining leaves, run REBALANCE PARTITIONS ... FORCE
. MemSQL will distribute the replacement partitions across the remaining leaves.
REBALANCE PARTITIONS .. FORCE locks in data loss. After running it, you can no longer reintroduce the data from the lost machine’s partitions.
Replace a Dead Leaf in a Redundancy-1 Cluster
This guide shows how to replace a dead leaf in a redundancy-1 cluster. It is assumed that data can be recovered from the dead leaf and will be restored at Step 5.
-
Unmonitor the dead MemSQL Leaf and uninstall the dead MemSQL Ops Agent by executing the following commands:
memsql-ops memsql-unmonitor <DEAD-LEAF-ID>
memsql-ops agent-uninstall --force --agent-id <DEAD-AGENT-ID>
You can get Leaf IDs and Agent IDs by executing the memsql-ops memsql-list
and memsql-ops agent-list
commands.
-
Remove the dead leaf from MemSQL cluster by running REMOVE LEAF on the dead leaf to remove it from
SHOW LEAVES
and free up its pairing slot.REMOVE LEAF "<DEAD-LEAF>"[:<PORT>];
-
Deploy a new MemSQL leaf by deploying a new MemSQL Ops Agent and MemSQL leaf via the web UI, or via CLI. Once the agent is deployed, you can replace the settings.conf file from the dead agent:
memsql-ops agent-deploy --host <HOST-IP> [--user <USER> --identity-file /path/to/id_rsa]
memsql-ops agent-stop <NEW-AGENT-ID>
# edit your settings.conf file at /var/lib/memsql-ops/settings.conf
memsql-ops agent-start <NEW-AGENT-ID>
memsql-ops memsql-deploy --agent-id <NEW-AGENT-ID> --role leaf
-
Stop the new MemSQL leaf to make sure the new MemSQL Leaf is NOT running before copying the recovered data:
memsql-ops memsql-stop <NEW-MEMSQL-LEAF-ID>
-
Copy the recovered data into the new MemSQL leaf data directory. Make sure to save the
memsql_id
file.sudo mv /var/lib/memsql/leaf-3306/data/memsql_id /tmp
sudo cp -r /path/to/recovered/leaf-3306/data/* /var/lib/memsql/leaf-3306/data/
sudo mv /tmp/memsql_id /var/lib/memsql/leaf-3306/data/
sudo chown -R memsql.memsql /var/lib/memsql/leaf-3306/data/
-
Restart the new MemSQL leaf.
memsql-ops memsql-start <NEW-MEMSQL-LEAF-ID>
-
Partitions are currently present in the new MemSQL leaf, but MemSQL distributed system is still unaware of them, so you must reattach the partitions by trigger partitions detection. Temporarily remove the new leaf from the cluster. MemSQL Ops will automatically re-attach the leaf, and this action will trigger detecting (and attaching) all partitions.
On the Master Aggregator, run:
REMOVE LEAF "<NEW-LEAF>";
In the examples below, 10.0.0.101 is the new MemSQL Leaf, and 10.0.2.128 is an existing leaf.
Before reattaching partitions:
SHOW PARTITIONS ON `memsql_demo`; **** +---------+------------+------+--------+--------+ | Ordinal | Host | Port | Role | Locked | +---------+------------+------+--------+--------+ | 0 | 10.0.2.128 | 3306 | Master | 0 | | 1 | 10.0.2.128 | 3306 | Master | 0 | | 2 | NULL | NULL | NULL | 0 | | 3 | NULL | NULL | NULL | 0 | ...
After reattaching partitions:
SHOW PARTITIONS ON `memsql_demo`; **** +---------+------------+------+--------+--------+ | Ordinal | Host | Port | Role | Locked | +---------+------------+------+--------+--------+ | 0 | 10.0.2.128 | 3306 | Master | 0 | | 1 | 10.0.2.128 | 3306 | Master | 0 | | 2 | 10.0.0.101 | 3306 | Master | 0 | | 3 | 10.0.0.101 | 3306 | Master | 0 | ...
Leaf Failures in a Redundancy-2 Cluster
One Leaf Dies
Any partitions for which the dead leaf was the partition master will be promoted on the dead leaf’s pair.
You can reintroduce the dead leaf, or add a new leaf to replace it.
Reintroducing the leaf is the simplest solution – MemSQL will automatically reattach the leaf as soon as it is back online.
If you decide to add a replacement leaf on a different host, you can follow this guide on how to replace a dead leaf in a redundancy-2 Cluster.
If the machine for the dead agent is still accessible, ensure all memsqld
processes are killed and their data directories are emptied before attempting to uninstall the agent.
-
Unmonitor the dead MemSQL Leaf and uninstall the dead MemSQL Ops Agent to remove the dead leaf from MemSQL Ops:
memsql-ops memsql-unmonitor <DEAD-LEAF-ID>
memsql-ops agent-uninstall --force --agent-id <DEAD-AGENT-ID>
Note that leaf and agent ids can be retrieved with
memsql-ops memsql-list
andmemsql-ops agent-list
. -
Figure out which availability group the failed leaf was in by running
SHOW LEAVES
on the master aggregator to identify the dead leaf and its availability group:show leaves; **** +----------------+------+--------------------+----------------+-----------+---------+--------------------+------------------------------+ | Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections | Average_Roundtrip_Latency_ms | +----------------+------+--------------------+----------------+-----------+---------+--------------------+------------------------------+ | 54.242.219.243 | 3306 | 1 | 54.196.216.103 | 3306 | online | 1 | 0.640 | | 54.160.224.3 | 3306 | 1 | 54.234.29.206 | 3306 | online | 1 | 0.623 | | 54.196.216.103 | 3306 | 2 | 54.242.219.243 | 3306 | online | 1 | 0.583 | | 54.234.29.206 | 3306 | 2 | 54.160.224.3 | 3306 | offline | 0 | NULL | +----------------+------+--------------------+----------------+-----------+---------+--------------------+------------------------------+
In this example, the offline leaf was in availability group 2.
-
Remove the dead leaf from MemSQL cluster by running
REMOVE LEAF
on the master aggregator to remove the dead leaf fromSHOW LEAVES
and free up its pairing slot.REMOVE LEAF "<DEAD-LEAF>"[:<PORT>] FORCE;
-
Deploy a new MemSQL leaf by deploying a new MemSQL Ops Agent and MemSQL leaf via the web UI, or via CLI. Once the agent is deployed, you can replace the settings.conf file from the dead agent:
memsql-ops agent-deploy --host <HOST-IP> [--user <USER> --identity-file /path/to/id_rsa]
memsql-ops agent-stop <NEW-AGENT-ID>
# edit your settings.conf file at /var/lib/memsql-ops/settings.conf
memsql-ops agent-start <NEW-AGENT-ID>
memsql-ops memsql-deploy --agent-id <NEW-AGENT-ID> --role leaf --availability-group <GROUP>
After this has completed,
SHOW LEAVES
on the master aggregator should indicate that all leaves are online and paired:+----------------+------+--------------------+----------------+-----------+--------+--------------------+------------------------------+ | Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections | Average_Roundtrip_Latency_ms | +----------------+------+--------------------+----------------+-----------+--------+--------------------+------------------------------+ | 54.242.219.243 | 3306 | 1 | 54.196.216.103 | 3306 | online | 2 | 0.578 | | 54.160.224.3 | 3306 | 1 | 54.145.52.142 | 3306 | online | 2 | 0.624 | | 54.196.216.103 | 3306 | 2 | 54.242.219.243 | 3306 | online | 2 | 0.612 | | 54.145.52.142 | 3306 | 2 | 54.160.224.3 | 3306 | online | 1 | 0.568 | +----------------+------+--------------------+----------------+-----------+--------+--------------------+------------------------------+
-
Finally, you need to rebalance the cluster, by running
REBALANCE PARTITIONS
for all your databases. On the Master Aggregator, you can run the following script:#!/bin/bash for DB in `mysql -u root -h 127.0.0.1 --batch --skip-pager --skip-column-names --execute="SHOW DATABASES" | grep -vE "^(memsql|information_schema|sharding)$"` do echo "Rebalancing partitions on DB $DB" mysql -u root -h 127.0.0.1 --batch --execute "REBALANCE PARTITIONS ON $DB" done
A Pair of Leaves Die
In this case, the partitions that were hosted on the dead leaves have no remaining instances. This means that these partitions are offline to reads and writes.
If either of the leaf machines (or at least its data) is recoverable, you can reintroduce it and reattach its partitions, as in the Leaf Failures In a Redundancy-1 Cluster section. At this point, the partitions are online for reads and writes. The failure scenario is now the same as the One Leaf Dies scenario in a redundancy-2 cluster.
If neither leaf machine is recoverable, then data loss has occurred. You can now add replacement leaves, after which you may run REBALANCE PARTITIONS ... FORCE
to create new (empty) replacement partitions.
Many Unpaired Leaves Die
As long as two paired leaves have not both died, all partitions are still available for reads and writes.
As a special case of this scenario, all leaves in one availability group can be down. No data loss is incurred as long as redundancy is restored before an additional leaf from the remaining availability group dies.
Many Leaves Die, Some of Them Paired
Every partition for which both leaves hosting it died is now offline to reads and writes. Partitions for which only one leaf in the relevant pair died remain online.
Offline partitions should be handled as they are in the scenario A pair of leaves die. However, RESTORE REDUNDANCY or REBALANCE PARTITIONS should be run only after all partitions has either been either recovered or given up as lost data (because both leaves that were hosting its data are unrecoverable).
Ops: Aggregator Failures
Master Aggregator Dies
If the master aggregator dies and is recoverable, all that is necessary is to restart the MemSQL process.
If it is not recoverable, then a child aggregator must be promoted to master with AGGREGATOR SET AS MASTER.
One Child Aggregator Dies
If the aggregator is recoverable, all that is necessary is to restart the MemSQL process. Otherwise, simply deploy a new MemSQL instance as an aggregator.
Many Aggregators Die
If more than one aggregator dies, they can be reintroduced or replaced one by one.
End: MemSQL Ops