Taking Leaves Offline without Cluster Downtime min read


Info

This topic does not apply to SingleStore Managed Service.

Info

If you are managing your cluster with MemSQL Ops, go here.

Occasionally hosts need to be taken offline for maintenance (upgrading memory, etc). This can present a challenge if these hosts are home to one or more leaf nodes.

By following the steps below, you can detach leaf nodes from a cluster, take the host offline for maintenance, and attach the leaves back to the cluster following maintenance. This can all be done without downtime to the cluster.

Assumptions:

  • The steps below assume the host IP addresses will not change during maintenance.

  • The steps below assume the cluster is configured for High Availability (redundancy 2). If both leaves in a paired group of leaves are detached from the cluster, the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.

Step 1: Check for long running queries

Before removing leaves make sure there are no long running queries present in the cluster. You can check this by using the SQL Editor in SingleStore DB Studio and running the following:

SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND = 'QUERY' AND STATE = 'executing';

Step 2: Ensure all database partitions are balanced

Read the Understanding Orphaned Partitions topic to verify if any orphaned partitions exist in the cluster. If there are, this topic explains how to resolve them.

Step 3. Confirm the leaf node you want to take offline has an online paired leaf on a different host

To confirm this, run sdb-admin show-leaves and check the results. Suppose you have a leaf node running on 172.18.1.5 and you want to take it offline. To confirm it has an online paired leaf, run sdb-admin show-leaves and observe that this node’s paired host is 172.18.1.6 and that the paired host is online:

sdb-admin show-leaves
****
✓ Successfully ran 'memsqlctl show-leaves'
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
|    Host    | Port | Availability Group | Pair Host  | Pair Port | State  | Opened Connections | Average Roundtrip Latency (ms) |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
| 172.18.1.5 | 3306 | 1                  | 172.18.1.6 | 3306      | online | 1                  | 1.538                          |
| 172.18.1.5 | 3307 | 1                  | 172.18.1.6 | 3307      | online | 2                  | 0.765                          |
| 172.18.1.6 | 3306 | 2                  | 172.18.1.5 | 3306      | online | 2                  | 0.898                          |
| 172.18.1.6 | 3307 | 2                  | 172.18.1.5 | 3307      | online | 2                  | 1.491                          |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+

Step 4: Detach the leaf or aggregator node(s) from the host to be taken offline for maintenance

A leaf node is detached from a cluster by using the following syntax:

DETACH LEAF'host':port;

For more information on this command see the reference.

Note: If both leaves in a paired group of leaves are detached from the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.

For host machines running aggregator nodes, use the following syntax to detach an aggregator from a host:

REMOVE AGGREGATOR 'host':port;

For more information on this command see the reference.

Step 5: Stop the SingleStore node(s)

Stop the SingleStore node(s) (leaves and aggregators) residing on all hosts that will be taken offline for maintenance.

sdb-admin stop-node --memsql-id <MemSQL_ID>

For more information on this command see the reference.

Step 6: Take the host offline, perform maintenance, bring host back online and confirm SingleStore DB is running

It is now safe to power down the host and perform maintenance. After performing maintenance bring the host back online.

Step 7: Start the SingleStore node(s)

Start the SingleStore node(s) (leaves and aggregators) residing on all hosts that were previously taken offline for maintenance and are now back online.

sdb-admin start-node --memsql-id <MemSQL_ID>

Step 8: Attach the leaf or aggregator node(s) back to the host that was taken offline for maintenance

Once maintenance is completed, the host is back online and SingleStore DB is running attach the leaf or aggregator node(s) back to the cluster.

A leaf node is attached to a cluster by using the following command from the master aggregator node:

ATTACH LEAF 'host':port NO REBALANCE;

For more information on this command, see the reference.

Note: If you took multiple leaf nodes offline and are attaching them back to the cluster you can use the reference to attach all detached leaves with one command:

To attach an aggregator node back to a cluster, use the following syntax:

ADD AGGREGATOR user:'password'@'host':port;

For more information on this command see the reference.

Step 9: Rebalance cluster partitions

After attaching the leaf node(s) to the host run the following command on your master aggregator node for each of your databases:

REBALANCE PARTITIONS ON <db_name_here>;

For more information on this command, see the reference.

Running REBALANCE PARTITIONS will redistribute data across your cluster. In doing so a portion of data in your cluster will be relocated to the SingleStore nodes that were attached in step 8.