Outdated Version

You are viewing an older version of this section. View current production version.

Cluster Downsizing Steps min read


Info

This topic does not apply to MemSQL Helios.

Downsizing a cluster can make sense for a few reasons:

  • You’ve determined that you can operate a cluster with less resources (which could potentially translate to cost savings)

  • Your data volume is lower than projected

  • Your data volume is higher than projected, which would justify splitting it across two clusters (e.g., migrate from a large cluster to a number of smaller clusters)

Regardless of the rationale, downsizing a cluster by removing leaf nodes is a standard, low-risk cluster operation that can be accomplished using MemSQL tools.

When removing a leaf node, the partitions from the node that is removed will be moved to other nodes in the cluster.

As the database is not being repartitioned, this puts more memory and processing stress on the remaining hosts as the number of the partitions will be sub-optimal for the reduced cluster size.

Notes on Downsizing

Info

If the cluster downsizing is permanent, we recommend backing up the databases via mysqldump or via the Backing Up and Restoring Data guide, then dropping the existing databases and reimporting the data. Doing so will repartition and evenly redistribute the databases and create an adequate number of partitions for the downsized cluster.

  • Ensure that the cluster is sized properly so the remaining nodes have enough memory to take over data from the removed nodes.

    • Confirm that the host and leaf configurations are the same across existing leaves. Sizing configuration (RAM, cores, etc.) and OS tuning (ulimit settings, THP settings, etc.) should match to ensure uniform query performance and cluster health. See System Requirements and Recommendations for more details.
  • Ensure that there are no long-running queries. If possible, throttle the application workload so the number of running queries is as low as possible. A best practice would be to halt all running workloads.

SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND = 'QUERY' AND STATE = 'executing';
  • Be aware that removing a partition can take time as the partitions need to be copied to other nodes before the node is removed from the cluster. Please do not attempt to kill the running operation. You should consider addressing any potential data skew issues prior to downsizing the cluster.

  • If you are using high availability (HA), you must maintain an even number of nodes and remove paired nodes from the cluster. You can review which leaf nodes are paired by connecting to the database and running SHOW LEAVES.

Example Cluster

Alert

As the following steps are intended for use with a cluster that is managed by MemSQL tools, do not run them on a cluster that is managed by MemSQL Ops.

In the example below, a cluster with 1 Master Aggregator (MA) and 6 leaves will be reduced to 4 leaves.

Since the nodes on 10.0.0.219 and 10.0.2.35 are paired, these two nodes will be deleted. This approach should prevent any need to reconfigure the availability groups.

Below is a sample cluster prior to modification.

Note: The following commands are run from the command line on the Master Aggregator host (10.0.3.34 in this example cluster). Substitute the values in your own cluster when running these commands.

memsql-admin list-nodes
****
+------------+--------+------------+------+---------------+--------------+---------+----------------+--------------------+
| MemSQL ID  |  Role  |	Host	| Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+------------+--------+------------+------+---------------+--------------+---------+----------------+--------------------+
| E2686600DF | Master | 10.0.3.34  | 3306 | Running   	| True     	| 6.8.3   | Online     	|                	|
| 79933F2F05 | Leaf   | 10.0.0.107 | 3306 | Running   	| True     	| 6.8.3   | Online     	| 1              	|
| 7AF82FFA6C | Leaf   | 10.0.0.219 | 3306 | Running   	| True     	| 6.8.3   | Online     	| 1              	|
| 0EBA8C7CE4 | Leaf   | 10.0.1.234 | 3306 | Running   	| True     	| 6.8.3   | Online     	| 2              	|
| C2A9AA5641 | Leaf   | 10.0.1.67  | 3306 | Running   	| True     	| 6.8.3   | Online     	| 2              	|
| CC425D4F59 | Leaf   | 10.0.1.75  | 3306 | Running   	| True     	| 6.8.3   | Online     	| 1              	|
| D35993788C | Leaf   | 10.0.2.35  | 3306 | Running   	| True     	| 6.8.3   | Online     	| 2              	|
+------------+--------+------------+------+---------------+--------------+---------+----------------+--------------------+

Remove Leaves

Run memsql-admin show-leaves.

Note: Preserve the output of the memsql-admin show-leaves command so that you can refer to it later.

Of particular note are Host, Availability Group (AG), and Pair Host. When possible, delete paired hosts so that the availability groups won’t need to be reconfigured.

As 10.0.0.219 and 10.0.2.35 are paired in the example cluster, it’s safe to delete their nodes. The availability groups won’t need to be reconfigured since this operation has no effect on the pairing between 10.0.0.107 and 10.0.1.67, or 10.0.1.75 and 10.0.1.234.

memsql-admin show-leaves
****
✓ Successfully ran 'memsqlctl show-leaves'
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
|	Host	| Port | Availability Group | Pair Host  | Pair Port | State  | Opened Connections | Average Roundtrip Latency (ms) |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
| 10.0.0.107 | 3306 | 1              	| 10.0.1.67  | 3306  	| online | 9              	| 0.22                           |
| 10.0.1.67  | 3306 | 2              	| 10.0.0.107 | 3306  	| online | 9              	| 0.245                      	|
| 10.0.1.75  | 3306 | 1              	| 10.0.1.234 | 3306  	| online | 9              	| 0.229                      	|
| 10.0.1.234 | 3306 | 2              	| 10.0.1.75  | 3306  	| online | 9              	| 0.268                      	|
| 10.0.0.219 | 3306 | 1              	| 10.0.2.35  | 3306  	| online | 9         	     | 0.258                      	|
| 10.0.2.35  | 3306 | 2              	| 10.0.0.219 | 3306  	| online | 9              	| 0.329                      	|
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+

The last two leaves are removed because they are paired. This maintains the other pairings and therefore has no effect on HA.

memsqlctl remove-leaf --host "10.0.0.219" --port 3306
****
memsqlctl will perform the following actions on the local master aggregator on port 3306:
  · Run `REMOVE LEAF 10.0.0.219:3306`
 
Would you like to continue? [y/N]: y
✓ Removed leaf 10.0.0.219:3306 from cluster
memsqlctl remove-leaf --host "10.0.2.35" --port 3306
****
memsqlctl will perform the following actions on the local master aggregator on port 3306:
  · Run `REMOVE LEAF 10.0.2.35:3306`
 
Would you like to continue? [y/N]: y
✓ Removed leaf 10.0.2.35:3306 from cluster
memsql-admin show-leaves
****
✓ Successfully ran 'memsqlctl show-leaves'
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
|	Host	| Port | Availability Group | Pair Host  | Pair Port | State  | Opened Connections | Average Roundtrip Latency (ms) |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
| 10.0.0.107 | 3306 | 1              	| 10.0.1.67  | 3306  	| online | 9              	| 0.204                      	|
| 10.0.1.67  | 3306 | 2              	| 10.0.0.107 | 3306  	| online | 9              	| 0.259                      	|
| 10.0.1.75  | 3306 | 1              	| 10.0.1.234 | 3306  	| online | 9              	| 0.226                      	|
| 10.0.1.234 | 3306 | 2              	| 10.0.1.75  | 3306  	| online | 9              	| 0.256                      	|
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+

Delete Nodes

Provided an aggregator isn’t running on the same host, you may now delete the nodes using the delete-node command.

For this example, each leaf is on its own host so deleting the entire node is safe.

memsql-admin delete-node --memsql-id 7AF82FFA6C --stop
****
Toolbox is about to stop and then delete the following nodes:
	- On host 10.0.0.219:
  	+ 7AF82FFA6C
 
Would you like to continue? [y/N]: y
Operation completed successfully
memsql-admin delete-node --memsql-id D35993788C --stop
****
Toolbox is about to stop and then delete the following nodes:
	- On host 10.0.2.35:
  	+ D35993788C
 
Would you like to continue? [y/N]: y
Operation completed successfully

Availability Groups and Removing Leaves

Note: If you don’t remove paired hosts, you will need to reconfigure your availability groups.

Both 10.0.1.75 and 10.0.0.107 were removed, which were not pairs. This results in the following two nodes being in availability group 2.

memsql-admin show-leaves
****
✓ Successfully ran 'memsqlctl show-leaves'
+------------+------+--------------------+-----------+-----------+--------+--------------------+--------------------------------+
|	Host	| Port | Availability Group | Pair Host | Pair Port | State  | Opened Connections | Average Roundtrip Latency (ms) |
+------------+------+--------------------+-----------+-----------+--------+--------------------+--------------------------------+
| 10.0.1.67  | 3306 | 2              	| null  	| null  	| online | 16             	| 0.267                      	|
| 10.0.1.234 | 3306 | 2              	| null  	| null  	| online | 16             	| 0.229                      	|
+------------+------+--------------------+-----------+-----------+--------+--------------------+--------------------------------+

The following will occur if an attempt is made to create a new database with the cluster in its current state:

memsql -u root -h localhost -p -e 'create database music';
****
Enter password:
ERROR 1782 (HY000) at line 1: CREATE DATABASE requires that all availability groups are balanced. The largest group has 2 leaves, while the smallest has 0. The difference must be at most one.

To rectify this, remove the two remaining leaves and then add them back into AG 1 and AG 2 so that they form a pair.

memsqlctl remove-leaf --host "10.0.1.67" --port 3306
****
memsqlctl will perform the following actions on the local master aggregator on port 3306:
  · Run `REMOVE LEAF 10.0.1.67:3306`
 
Would you like to continue? [y/N]: y
✓ Removed leaf 10.0.1.67:3306 from cluster
memsqlctl remove-leaf --host "10.0.1.234" --port 3306
memsqlctl will perform the following actions on the local master aggregator on port 3306:
  · Run `REMOVE LEAF 10.0.1.234:3306`
 
Would you like to continue? [y/N]: y
✓ Removed leaf 10.0.1.234:3306 from cluster
memsql-admin add-leaf --host 10.0.1.234 --port 3306 --password iamazon --availability-group 1
Toolbox will perform the following actions on host 10.0.3.34:
  · Run 'memsqlctl add-leaf --host 10.0.1.234 --port 3306 --user root --password ●●●●●● --availability-group 1'
 
Would you like to continue? [y/N]: y
✓ Successfully ran 'memsqlctl add-leaf'
Operation completed successfully
memsql-admin add-leaf --host 10.0.1.67 --port 3306 --password ●●●●●● --availability-group 2
****
Toolbox will perform the following actions on host 10.0.3.34:
  · Run 'memsqlctl add-leaf --host 10.0.1.67 --port 3306 --user root --password ●●●●●● --availability-group 2'
 
Would you like to continue? [y/N]: y
✓ Successfully ran 'memsqlctl add-leaf'
Operation completed successfully
memsql-admin show-leaves
****
✓ Successfully ran 'memsqlctl show-leaves'
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
|	Host	| Port | Availability Group | Pair Host  | Pair Port | State  | Opened Connections | Average Roundtrip Latency (ms) |
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+
| 10.0.1.234 | 3306 | 1              	| 10.0.1.67  | 3306  	| online | 1              	| 0.251                      	|
| 10.0.1.67  | 3306 | 2              	| 10.0.1.234 | 3306  	| online | 2              	| 0.249                      	|
+------------+------+--------------------+------------+-----------+--------+--------------------+--------------------------------+

This configuration now allows for a new database to be created.

Rebalance Partitions

Once all of the designated nodes have been removed, run REBALANCE PARTITIONS on the affected databases.

On the Master Aggregator host:

  1. Use the MemSQL client to connect to MemSQL.

  2. Run the SNAPSHOT DATABASE command to bring transaction logs up to date.

    As a general rule, taking a snapshot is helpful before any maintenance/clustering operation where data is being moved or recovered as it compresses files by combining multiple statements into one.

  3. Run REBALANCE PARTITIONS ON <db_name>.

    Perform this step for all of the databases in your cluster.

    Alternatively, the following script will rebalance the partitions on all the databases in the cluster when run on the Master Aggregator host.

    #!/bin/bash
       
    for DB in `mysql -u root -h 127.0.0.1 --batch --skip-pager --skip-column-names --execute="SHOW DATABASES" | grep -vE "^(memsql|information_schema|sharding)$"`
    do
      echo "Rebalancing partitions on DB $DB"
      mysql -u root -h 127.0.0.1 --batch --execute "REBALANCE PARTITIONS ON $DB"
    done