Dealing with Cluster Failures min read


Info

This topic does not apply to SingleStore Managed Service.

Assess the Situation

This guide is designed to help resolve common failure scenarios that one may encounter during cluster operation.

The most important step in resolving any cluster failure is to assess your cluster and determine:

  • Are there any network issues that are preventing the cluster from operating properly?

  • Are there any recent hardware issues?

  • Are any of the drives suddenly experiencing issues?

  • Is your cluster utilizing high availability, and, if so, which replication mode is it using (redundancy-1 or redundancy-2)?

Knowing the answers to these questions will help you decide which of the following sections will help you resolve your issue(s) and whether any data loss will be incurred.

Tools: Leaf Node Failures

Replace a Failed Leaf Node in a Redundancy-1 Cluster

Warning

Review the size of your SingleStore DB data directory and confirm that at least twice the amount of disk space used by the leaf node’s data directory is available. This space is required to hold a copy of the leaf node’s data directory. Another drive, including an external drive, can also be used to hold a copy of the leaf node’s data.

A leaf node’s data typically resides in /var/lib/memsql/<node-directory>/data by default. Note that the <node-directory> in SingleStore DB v7.1 is a hash, and that you will need to look within this directory to determine if it’s the appropriate node. You can check the ID matches the memsql-id within the node’s data directory (e.g. cat ./data/memsql_id).

This section covers the replacement of a failed leaf node within a redundancy-1 cluster. In this example, it is assumed that data can be recovered from the failed leaf node, which will then be restored.

  1. Determine if the leaf node in question is still running within the cluster. If it is, first stop it, and then remove the “leaf” role from the node.

    sdb-admin stop-node --memsql-id <node-ID>
    
    sdb-admin remove-leaf --force --memsql-id <node-ID>
    
  2. Assuming that the data is recoverable from the failed leaf node, preserve it by changing to the node directory (on the corresponding host) and compressing it to another location, such as the /tmp directory.

    cd /path/to/memsql/<node-directory>
    
    sudo tar -zcvf /tmp/data.tgz ./data/
    
  3. As the leaf node’s data has been preserved, delete the leaf node from the cluster.

    sdb-admin delete-node --memsql-id <node-ID> --skip-remove
    
  4. Create a new node to store this data in. After the node’s been created, stop it so that you can manipulate the node’s data directory.

    sdb-admin create-node --host [IP OF NODE HOST] --port [PORT OF NODE HOST] --password [PASSWORD OF NODE]
    
    sdb-admin stop-node --memsql-id <node-ID>
    
  5. Navigate to the newly created node’s directory and remove the data directory within. Then, extract the data from the /tmp directory to this node’s data directory.

    cd /var/lib/memsql/<new-node-directory>
    
    sudo rm -r ./data
    
    sudo tar -zxvf /tmp/data.tgz
    
  6. Once the extraction is finished, update the ownership and permissions on the directory’s files.

    sudo chown memsql:memsql -R ./data
    
  7. Start the new node. The MemSQL ID will now be that of the former node, which you can confirm with the following command.

    sdb-admin start-node --all
    
  8. Run the following SQL command on the Master Aggregator host to add this leaf node to the cluster.

    ADD LEAF <user>:'<password>'@'<node's-host-IP>':<port>;
    

Replace a Failed Leaf Node in a Redundancy-2 Cluster

Warning

Review the size of your SingleStore DB data directory and determine if you have enough disk space remaining when you proceed with the recovery process.

This section details how to recover a failed leaf node in a high availability cluster.

Clusters using Async or Sync replication could potentially present different errors, so knowing which replication option that is being used will be key to recovering a failed cluster.

By default, all MemSQL v7.0 or later clusters will run with high availability using sync replication. Sync replication ensures that the shared data housed within the primary and secondary leaf nodes is always in sync.

Async replication treats data differently and, as its first priority, allows the cluster to continue to run. While data will eventually be synced between the leaf nodes, it is not a requirement that it happen at the time of the transaction. Refer to Replication and Durability for more information.

Sometimes data can become out of sync when one or more sizable queries are run. If failover occurs during one of these queries, SingleStore DB will recognize that it’s failing over from a leaf node that contains newer data. As the asynchronous replication of data on the secondary leaf node was not up to date when the failover occurred, there will be an unavoidable amount of data loss.

As a result, the cluster will stop performing transactions and will throw an Asynchronous replication with a “FailoverLeaf: Couldn’t failover from node." error. This is intentional, as it affords the cluster administrator to take manual control of the cluster to assess what has occurred.

To acknowledge the potential data loss and manually move the data to the secondary leaf node that is now out of sync, use a SQL editor to run the REBALANCE PARTITIONS command on each database. This will prevent any potential data loss from the leaf nodes being out of sync.

Alert: If REBALANCE PARTITIONS is run before attempting to retain or fix the leaf/partition, the data will be lost and will no longer be recoverable.

REBALANCE PARTITIONS ON db-name;

You may also use the bash script in the One Leaf Node Fails section below to rebalance the partitions on all nodes and restore redundancy on all databases.

One Leaf Node Fails

Reintroduce the Leaf Node

The section details how to reintroduce a failed leaf node in a Redundancy-2 cluster. Reintroducing the leaf node is the simplest solution as SingleStore DB will automatically reattach the leaf node once it’s back online.

If the failed leaf node was the partition master, those partitions will be promoted to its pair. You may then reintroduce the failed leaf node to the cluster, or add a new leaf node to replace it.

Replace the Leaf Node

The section details how to replace a failed leaf node in a Redundancy-2 cluster with a replacement leaf node from a different host.

If the failed leaf node was the partition master, those partitions will be promoted to its pair. You may then reintroduce the failed leaf node to the cluster, or add a new leaf node to replace it.

  1. If the host of the failed leaf node is still available, note either its availability group, or the availability group of its pair.

  2. If the host that the failed leaf node resides on has also failed, the leaf node must be removed from the cluster. Determine if the failed leaf node is still shown.

    sdb-admin list-nodes
    

    If so, remove it from the cluster.

    sdb-admin remove-leaf --memsql-id <node-ID>
    

    If not, skip removing it from the cluster and just remove the failed host.

    sdb-admin delete-node --memsql-id <node-ID> --skip-remove
    
    sdb-toolbox-config unregister-host --host <host-IP>
    
  3. The leaf node may still be visible to the cluster, but Toolbox will no longer recognize it. To confirm, run the following SQL command on the Master Aggregator host and look for the IP and port of the failed leaf node.

    SHOW LEAVES;
    
  4. If the leaf node persists within the cluster, remove it from the Master Aggregator.

    REMOVE LEAF 'IP OF LEAF':PORT;.
    
  5. Using Toolbox, add a new replacement host to the cluster.

    sdb-toolbox-config register-host --host <host-IP> -i <SSH-identity-file>
    
  6. Deploy SingleStore DB to this host.

    sdb-deploy install --host <host-IP>
    
  7. Create a replacement node, assign it a “leaf” role, and add it to the availability group you noted earlier.

    sdb-admin create-node --host <host-IP> --password <secure-password> --port <port>
    
    sdb-admin add-leaf --memsql-id <ID-of-new-node> --availability-group <1 or 2>
    
  8. Rebalance the partitions on all nodes and restore redundancy on all databases.

    REBALANCE PARTITIONS
    

    The following script may also be used to rebalance partitions on all databases:

    #!/bin/bash
    for DB in `memsql -u root -h 127.0.0.1 --batch --skip-pager --skip-column-names --execute="SHOW DATABASES" | grep -vE "^(memsql|information_schema|sharding|cluster)$"`
    do
      echo "Rebalancing partitions on DB $DB"
      memsql -u root -h 127.0.0.1 --batch --execute "REBALANCE PARTITIONS ON $DB"
    done
    

A Pair of Leaf Nodes Fail

When a pair of leaf nodes fail, their partitions will no longer have any remaining instances, which effectively takes these partitions offline for both reads and writes.

If either of the leaf nodes’ hosts, or a leaf node’s data on a host, is recoverable, a failed leaf node can be reintroduced and reattached to its partitions by following the steps in the Replace a Failed Leaf Node in a Redundancy-1 Cluster section. After following those steps, the partitions will be back online for both reads and writes.

If neither leaf node’s host is recoverable, then data loss has occurred. You must now add replacement leaf nodes and run REBALANCE PARTITIONS ... FORCE to create new (empty) replacement partitions. This can be done by following the steps in the Replace a Leaf Node in a Redundancy-2 Cluster section.

Many Unpaired Leaf Nodes Fail

So long as two paired leaf nodes have not failed, all partitions are still available for reads and writes.

In certain circumstances, all of the leaf nodes in one availability group can fail, but no data loss will be incurred so long as redundancy is restored before another leaf node fails in the remaining availability group.

Many Leaf Nodes Fail, Some of them Paired

When both leaf nodes in a pair fail, every partition that is hosted by these two leaf nodes will be offline to reads and writes.

When one leaf node of a pair fails, the partitions of its pair will remain online for reads and writes.

Offline partitions should be handled using the method detailed in the A Pair of Leaf Nodes Fail section. However, as both leaf nodes are unrecoverable, RESTORE REDUNDANCY or REBALANCE PARTITIONS should only be run after all partitions have either been recovered or abandoned as lost data.

Tools: Aggregator Failures

Master Aggregator Fails

If the Master Aggregator node fails and is recoverable, restart the Master Aggregator.

If it is not recoverable, a Child Aggregator must be promoted to the role of Master Aggregator by running either AGGREGATOR SET AS MASTER in SQL or sdb-admin bootstrap-aggregator on the command line.

One Child Aggregator Fails

If a Child Aggregator is recoverable, restart the Child Aggregator.

If it is not recoverable, deploy a new aggregator by creating a new node and adding it to the cluster as a Child Aggregator.

Many Aggregators Fail

If more than one aggregator fails, each aggregator can either be reintroduced or removed, created, and then replaced one by one.

End: SingleStore Tools

Ops: Leaf Node Failures

Leaf Failures in a Redundancy-1 Cluster

When a leaf dies in a redundancy-1 cluster, all partitions hosted on that leaf will be offline to reads and writes. There are three potential resolutions: reintroduce the leaf, introduce a replacement leaf, or recreate the lost partitions on the remaining leaves.

If the leaf machine is recoverable, SingleStore DB will automatically reattach the leaf as soon as it is back online.

If the leaf machine is unrecoverable but you can still access its data, then you can introduce a replacement following the guide on how to Replace a Dead Leaf in a Redundancy-1 Cluster.

If the leaf machine and its data is unrecoverable, and you wish to introduce a replacement, you can follow the guide on how to Replace a Dead Leaf in a Redundancy-1 Cluster, steps 1 to 3. Then run REBALANCE PARTITIONS … FORCE. SingleStore DB will create new (empty) partitions to replace those that were lost.

Finally, if you wish to recreate the lost partitions on the remaining leaves, run REBALANCE PARTITIONS ... FORCE. SingleStore DB will distribute the replacement partitions across the remaining leaves.

Warning

REBALANCE PARTITIONS .. FORCE locks in data loss. After running it, you can no longer reintroduce the data from the lost machine’s partitions.

Replace a Dead Leaf in a Redundancy-1 Cluster

This guide shows how to replace a dead leaf in a redundancy-1 cluster. It is assumed that data can be recovered from the dead leaf and will be restored at Step 5.

  1. Unmonitor the dead SingleStore DB Leaf and uninstall the dead MemSQL Ops Agent by executing the following commands:

    memsql-ops memsql-unmonitor <DEAD-LEAF-ID>
    
    memsql-ops agent-uninstall --force --agent-id <DEAD-AGENT-ID>
    
Info

You can get Leaf IDs and Agent IDs by executing the memsql-ops memsql-list and memsql-ops agent-list commands.

  1. Remove the dead leaf from cluster by running REMOVE LEAF on the dead leaf to remove it from SHOW LEAVES and free up its pairing slot.

    memsql> REMOVE LEAF "<DEAD-LEAF>"[:<PORT>];
    
  2. Deploy a new leaf by deploying a new MemSQL Ops Agent and leaf via the web UI, or via CLI. Once the agent is deployed, you can replace the settings.conf file from the dead agent:

    memsql-ops agent-deploy --host <HOST-IP> [--user <USER> --identity-file /path/to/id_rsa]
    
    memsql-ops agent-stop <NEW-AGENT-ID>
    

    Edit your settings.conf file at /var/lib/memsql-ops/settings.conf.

    memsql-ops agent-start <NEW-AGENT-ID>
    
    memsql-ops memsql-deploy --agent-id <NEW-AGENT-ID> --role leaf
    
  3. Stop the new leaf to make sure the new SingleStore DB Leaf is NOT running before copying the recovered data:

    memsql-ops memsql-stop <NEW-LEAF-ID>
    
  4. Copy the recovered data into the new leaf data directory. Make sure to save the memsql_id file.

    sudo mv /var/lib/memsql/leaf-3306/data/memsql_id /tmp
    
    sudo cp -r /path/to/recovered/leaf-3306/data/* /var/lib/memsql/leaf-3306/data/
    
    sudo mv /tmp/memsql_id /var/lib/memsql/leaf-3306/data/
    
    sudo chown -R memsql.memsql /var/lib/memsql/leaf-3306/data/
    
  5. Restart the new leaf.

    memsql-ops memsql-start <NEW-LEAF-ID>
    
  6. Partitions are currently present in the new leaf, but SingleStore DB distributed system is still unaware of them, so you must reattach the partitions by trigger partitions detection. Temporarily remove the new leaf from the cluster. MemSQL Ops will automatically re-attach the leaf, and this action will trigger detecting (and attaching) all partitions.

    On the Master Aggregator, run:

    REMOVE LEAF "<NEW-LEAF>";
    

    In the examples below, 10.0.0.101 is the new SingleStore DB Leaf, and 10.0.2.128 is an existing leaf.

    Before reattaching partitions:

    SHOW PARTITIONS ON `memsql_demo`;
    ****
    +---------+------------+------+--------+--------+
    | Ordinal | Host       | Port | Role   | Locked |
    +---------+------------+------+--------+--------+
    |       0 | 10.0.2.128 | 3306 | Master |      0 |
    |       1 | 10.0.2.128 | 3306 | Master |      0 |
    |       2 | NULL       | NULL | NULL   |      0 |
    |       3 | NULL       | NULL | NULL   |      0 |
    ...
    

    After reattaching partitions:

    SHOW PARTITIONS ON `memsql_demo`;
    ****
    +---------+------------+------+--------+--------+
    | Ordinal | Host       | Port | Role   | Locked |
    +---------+------------+------+--------+--------+
    |       0 | 10.0.2.128 | 3306 | Master |      0 |
    |       1 | 10.0.2.128 | 3306 | Master |      0 |
    |       2 | 10.0.0.101 | 3306 | Master |      0 |
    |       3 | 10.0.0.101 | 3306 | Master |      0 |
    ...
    

Leaf Failures in a Redundancy-2 Cluster

One Leaf Dies

Any partitions for which the dead leaf was the partition master will be promoted on the dead leaf’s pair.

You can reintroduce the dead leaf, or add a new leaf to replace it.

Reintroducing the leaf is the simplest solution – SingleStore DB will automatically reattach the leaf as soon as it is back online.

If you decide to add a replacement leaf on a different host, you can follow this guide on how to replace a dead leaf in a redundancy-2 Cluster.

Warning

If the machine for the dead agent is still accessible, ensure all memsqld processes are killed and their data directories are emptied before attempting to uninstall the agent.

  1. Unmonitor the dead SingleStore DB Leaf and uninstall the dead MemSQL Ops Agent to remove the dead leaf from MemSQL Ops:

    memsql-ops memsql-unmonitor <DEAD-LEAF-ID>
    
    memsql-ops agent-uninstall --force --agent-id <DEAD-AGENT-ID>
    

    Note that leaf and agent ids can be retrieved with memsql-ops memsql-list and memsql-ops agent-list.

  2. Figure out which availability group the failed leaf was in by running SHOW LEAVES on the master aggregator to identify the dead leaf and its availability group:

    show leaves;
    ****
    +----------------+------+--------------------+----------------+-----------+---------+--------------------+------------------------------+
    | Host           | Port | Availability_Group | Pair_Host      | Pair_Port | State   | Opened_Connections | Average_Roundtrip_Latency_ms |
    +----------------+------+--------------------+----------------+-----------+---------+--------------------+------------------------------+
    | 54.242.219.243 | 3306 |                  1 | 54.196.216.103 |      3306 | online  |                  1 |                        0.640 |
    | 54.160.224.3   | 3306 |                  1 | 54.234.29.206  |      3306 | online  |                  1 |                        0.623 |
    | 54.196.216.103 | 3306 |                  2 | 54.242.219.243 |      3306 | online  |                  1 |                        0.583 |
    | 54.234.29.206  | 3306 |                  2 | 54.160.224.3   |      3306 | offline |                  0 |                         NULL |
    +----------------+------+--------------------+----------------+-----------+---------+--------------------+------------------------------+
    

    In this example, the offline leaf was in availability group 2.

  3. Remove the dead leaf from cluster by running REMOVE LEAF on the master aggregator to remove the dead leaf from SHOW LEAVES and free up its pairing slot.

    REMOVE LEAF "<DEAD-LEAF>"[:<PORT>] FORCE;
    
  4. Deploy a new leaf by deploying a new MemSQL Ops Agent and leaf via the web UI, or via CLI. Once the agent is deployed, you can replace the settings.conf file from the dead agent:

    memsql-ops agent-deploy --host <HOST-IP> [--user <USER> --identity-file /path/to/id_rsa]
    
    memsql-ops agent-stop <NEW-AGENT-ID>
    

    Edit your settings.conf file at /var/lib/memsql-ops/settings.conf.

    memsql-ops agent-start <NEW-AGENT-ID>
    
    memsql-ops memsql-deploy --agent-id <NEW-AGENT-ID> --role leaf --availability-group <GROUP>
    

    After this has completed, SHOW LEAVES on the master aggregator should indicate that all leaves are online and paired:

    +----------------+------+--------------------+----------------+-----------+--------+--------------------+------------------------------+
    | Host           | Port | Availability_Group | Pair_Host      | Pair_Port | State  | Opened_Connections | Average_Roundtrip_Latency_ms |
    +----------------+------+--------------------+----------------+-----------+--------+--------------------+------------------------------+
    | 54.242.219.243 | 3306 |                  1 | 54.196.216.103 |      3306 | online |                  2 |                        0.578 |
    | 54.160.224.3   | 3306 |                  1 | 54.145.52.142  |      3306 | online |                  2 |                        0.624 |
    | 54.196.216.103 | 3306 |                  2 | 54.242.219.243 |      3306 | online |                  2 |                        0.612 |
    | 54.145.52.142  | 3306 |                  2 | 54.160.224.3   |      3306 | online |                  1 |                        0.568 |
    +----------------+------+--------------------+----------------+-----------+--------+--------------------+------------------------------+
    
  5. Finally, you need to rebalance the cluster, by running REBALANCE PARTITIONS for all your databases. On the Master Aggregator, you can run the following script:

    #!/bin/bash
    
    for DB in `mysql -u root -h 127.0.0.1 --batch --skip-pager --skip-column-names --execute="SHOW DATABASES" | grep -vE "^(memsql|information_schema|sharding)$"`
    do
      echo "Rebalancing partitions on DB $DB"
      mysql -u root -h 127.0.0.1 --batch --execute "REBALANCE PARTITIONS ON $DB"
    done
    

A Pair of Leaves Die

In this case, the partitions that were hosted on the dead leaves have no remaining instances. This means that these partitions are offline to reads and writes.

If either of the leaf machines (or at least its data) is recoverable, you can reintroduce it and reattach its partitions, as in the Leaf Failures In a Redundancy-1 Cluster section. At this point, the partitions are online for reads and writes. The failure scenario is now the same as the One Leaf Dies scenario in a redundancy-2 cluster.

If neither leaf machine is recoverable, then data loss has occurred. You can now add replacement leaves, after which you may run REBALANCE PARTITIONS ... FORCE to create new (empty) replacement partitions.

Many Unpaired Leaves Die

As long as two paired leaves have not both died, all partitions are still available for reads and writes.

As a special case of this scenario, all leaves in one availability group can be down. No data loss is incurred as long as redundancy is restored before an additional leaf from the remaining availability group dies.

Many Leaves Die, Some of Them Paired

Every partition for which both leaves hosting it died is now offline to reads and writes. Partitions for which only one leaf in the relevant pair died remain online.

Offline partitions should be handled as they are in the scenario A pair of leaves die. However, RESTORE REDUNDANCY or REBALANCE PARTITIONS should be run only after all partitions has either been either recovered or given up as lost data (because both leaves that were hosting its data are unrecoverable).

Ops: Aggregator Failures

Master Aggregator Dies

If the master aggregator dies and is recoverable, all that is necessary is to restart the SingleStore DB process.

If it is not recoverable, then a child aggregator must be promoted to master with AGGREGATOR SET AS MASTER.

One Child Aggregator Dies

If the aggregator is recoverable, all that is necessary is to restart the SingleStore DB process. Otherwise, simply deploy a new SingleStore DB instance as an aggregator.

Many Aggregators Die

If more than one aggregator dies, they can be reintroduced or replaced one by one.

End: MemSQL Ops