Taking Leaves Offline without Cluster Downtime (MemSQL Ops)
Please follow this guide to learn how to migrate to SingleStore tools.
This topic does not apply to MemSQL Helios.
Occasionally hosts need to be taken offline for maintenance (upgrading memory, etc). This can present a challenge if these hosts are home to one or more MemSQL leaf nodes.
By following the steps below, you can detach MemSQL leaf nodes from a MemSQL cluster, take the host offline for maintenance, and attach the leaves back to the cluster following maintenance. This can all be done without downtime to the MemSQL cluster.
Assumptions:
-
The steps below assume the host IP addresses will not change during maintenance.
-
The steps below assume the cluster is configured for High Availability (redundancy 2). If both leaves in a paired group of leaves are detached from the cluster, the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.
Step 1: Enable manual cluster control of the MemSQL cluster
Manual cluster control is enabled by running the following command:
memsql-ops cluster-manual-control --enable
For more information on this command, see the reference.
The cluster is now under manual control. We will not automatically restart stopped nodes, update license files, etc.
Enabling manual cluster control will prevent the leaf nodes we remove from automatically being re-added to the cluster.
You can check the current status (enabled or disabled) of manual cluster control in the MemSQL Ops Dashboard:
Step 2: Check long running queries section of Ops web UI
Before removing leaves make sure there are no long running queries present in the cluster:
Step 3: Ensure all database partitions are balanced
Read the Understanding Orphaned Partitions topic to verify if any orphaned partitions exist in the cluster. If there are, this topic explains how to resolve them.
Step 4. Confirm the leaf node you want to take offline has an online paired leaf on a different host
To confirm this, run SHOW LEAVES
and check the results. Suppose you have a leaf node running on 192.168.1.110
and you want to take it offline. To confirm it has an online paired leaf, run SHOW LEAVES
and observe that this node’s paired host is 192.168.1.112
and the paired host is online:
+----------------+------+--------------------+---------------+------------+--------+--------------------+------------------------------+--------+
| Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections | Average_Roundtrip_Latency_ms | NodeId |
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+-------+--------+
| 192.168.1.110 | 3307 | 1 | 192.168.1.112 | 3307 | online | 2 | 0.168 | 6 |
| 192.168.1.112 | 3307 | 2 | 192.168.1.110 | 3307 | online | 2 | 0.255 | 8 |
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+-------+--------+
2 rows in set (0.00 sec)
Step 5: Detach the MemSQL leaf node(s) from the host to be taken offline for maintenance
A MemSQL leaf node is detached from a MemSQL cluster by using the following syntax:
DETACH LEAF'host':port;
For more information on this command see the reference.
Note: If both leaves in a paired group of leaves are detached from the cluster will become unavailable and downtime will be experienced. For this reason only one availability group of leaves should be detached at a time.
Step 6: Detach the MemSQL aggregator(s) from the host to be taken offline for maintenance
Use the following syntax to detach an aggregator from a host:
REMOVE AGGREGATOR 'host':port;
For more information on this command see the reference.
Step 7: Stop the MemSQL node(s)
Stop the MemSQL node(s) (leaves and aggregators) residing on all hosts that will be taken offline for maintenance. Stop the MemSQL nodes using memsql-ops memsql-stop
.
For more information on this command see the reference.
Step 8: Stop the Ops follower agent
Stop the Ops follower agent residing on all hosts that will be taken offline for maintenance. You can shut down a follower agent using memsql-ops agent-stop
OR by SSHing directly to the host and running memsql-ops stop
.
For more information on this command see the reference.
Step 9: Take the host offline, perform maintenance, bring host back online and confirm MemSQL is running
It is now safe to power down the host and perform maintenance. After performing maintenance bring the host back online.
Step 10: Start the Ops follower agent
Start the Ops follower agent residing on all hosts that were previously taken offline for maintenance and are now back online. You can start a follower agent using memsql-ops agent-start
OR by SSHing directly to the host and running memsql-ops start
.
Step 11: Start the MemSQL node(s)
Start the MemSQL node(s) (leaves and aggregators) residing on all hosts that were previously taken offline for maintenance and are now back online. Start the MemSQL node(s) using memsql-ops memsql-start
.
Step 12: Attach the MemSQL leaf node(s) back to the host that was taken offline for maintenance
Once maintenance is completed, the host is back online and MemSQL is running attach the MemSQL leaf node(s) back to the cluster.
A MemSQL leaf node is attached to a MemSQL cluster by using the following command from the master aggregator node:
ATTACH LEAF 'host':port NO REBALANCE;
For more information on this command, see the reference.
Note: If you took multiple leaf nodes offline and are attaching them back to the cluster you can use the reference to attach all detached leaves with one command:
Step 13: Attach the MemSQL aggregator(s) back to the host that was taken offline for maintenance
Use the following syntax to attach an aggregator from a host:
ADD AGGREGATOR user:'password'@'host':port;
For more information on this command see the reference.
Step 14: Disable manual control of the MemSQL cluster
Manual cluster control is disabled by running the following command:
memsql-ops cluster-manual-control --disable
For more information on this command, see the reference. The cluster is no longer under manual control. We will now automatically restart stopped nodes, update license files, etc.
Step 15: Run REBALANCE PARTITIONS
After attaching the MemSQL leaf node(s) to the host run the following command on your MemSQL master aggregator node for each of your databases:
REBALANCE PARTITIONS ON <db_name_here>;
For more information on this command, see the reference.
Running rebalance partitions will redistribute data across your cluster. In doing so a portion of data in your cluster will be relocated to the MemSQL nodes that were attached in step 12.