MemSQL replication allows database replication between MemSQL instances, executed at the partition level. It is simple, robust, and fast. This topic describes how to use replication in MemSQL.
MemSQL replication is fully online. In the middle of a continuous write workload, you can start replication to a secondary (replica) cluster without pausing the primary (source) cluster. Replication then creates a read-only database replica that can be used for disaster recovery or to serve additional reads.
Replication across clusters, which includes cross datacenter replication, only supports asynchronous mode. In asynchronous mode, writes on the primary cluster will never wait to be replicated to the secondary cluster. Furthermore, secondary cluster failures will never block the master.
In contrast, replication within a cluster, used for maintaining MemSQL high availability, can be either asynchronous or synchronous. In synchronous mode, writes to any master partition will be replicated to its corresponding replica partition first before an acknowledgement is returned to the user. For more information on within-cluster replication, please refer to Managing High Availability.
The rest of this article is focused exclusively on cross-cluster replication.
To replicate a database, the secondary cluster user must have
CREATE DATABASE
privileges and the primary cluster user (the one
specified in REPLICATE DATABASE
) must have REPLICATION
privileges on the primary cluster’s master aggregator.
Replication Commands
Replication in MemSQL is controlled entirely from the secondary cluster, where the following commands are run:
- REPLICATE DATABASE
- PAUSE REPLICATING
- CONTINUE REPLICATING
- STOP REPLICATING
- SHOW REPLICATION STATUS
- SHOW PARTITIONS
- SHOW CLUSTER STATUS
Note that most of these commands are also applicable to within-cluster replication.
Setting Up Replication
This example will guide you through setting up replication of a database. These instructions assume that you have two MemSQL clusters running. The following host:port
combinations represent the master aggregators of the primary and secondary clusters:
primary-MA:3306
secondary-MA:3306
Note that the primary and secondary clusters need not have identical topologies. MemSQL will automatically manage sharding of replica data on the secondary cluster. In this example, primary-MA
has a root user with an empty password.
To begin replicating the database db_name
from primary-MA
, run the following command on secondary-MA
:
memsql> REPLICATE DATABASE db_name FROM root:root_password@primary-MA:3306;
Note that multiple secondary clusters can replicate from a single primary cluster. To do this, run REPLICATE DATABASE
on the master aggregator of each of the replica clusters.
Pausing and Stopping Replication
MemSQL allows users to pause and resume online replication with single commands.
memsql> PAUSE REPLICATING db_name;
Query OK, 1 row affected (0.06 sec)
memsql> CONTINUE REPLICATING db_name;
Query OK, 1 row affected (0.96 sec)
PAUSE REPLICATING
temporarily pauses replication but maintains the replication relationship between master and secondary databases. To begin replicating from a different primary cluster, you must start a new REPLICATE DATABASE
process.
STOP REPLICATING db_name
halts replication of db_name
and automatically promotes the db_name
instance on the secondary cluster to a “full” MemSQL database with all read, write, DDL, and DML operations. Once replication on a cluster has been stopped, it cannot be restarted. To resume replication on an individual replica partition, you can use CONTINUE REPLICATING with the FORCE
option.
Monitoring Replication
SHOW PARTITIONS EXTENDED
Running SHOW PARTITIONS EXTENDED
on secondary-MA
will display information such as replication role and from where each partition is replicating.
memsql> SHOW PARTITIONS EXTENDED;
+---------+-----------+------+--------+--------+----------------------------------------------------------------------------+--------------+------------+--------------+
| Ordinal | Host | Port | Role | Locked | Info | Last Command | Last Error | Last Message |
+---------+-----------+------+--------+--------+----------------------------------------------------------------------------+--------------+------------+--------------+
| 0 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 1 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 2 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 3 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 4 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 5 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 6 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 7 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 8 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 9 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 10 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 11 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 12 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 13 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 14 | 10.0.3.12 | 3306 | Master | 0 | Primary master on '10.0.3.6':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
| 15 | 10.0.3.10 | 3306 | Master | 0 | Primary master on '10.0.3.8':3306 at 0:0 [ MASTER=yes SYNC=no ATTACH=yes ] | CREATE | 0 | |
+---------+-----------+------+--------+--------+----------------------------------------------------------------------------+--------------+------------+--------------+
SHOW CLUSTER STATUS
Running SHOW CLUSTER STATUS
provide information like log replay
position.
memsql> SHOW CLUSTER STATUS;
+-----------+------+----------+-----------+-------------+-----------+---------+
| Host | Port | Database | Role | State | Position | Details |
+-----------+------+----------+-----------+-------------+-----------+---------+
| 10.0.3.10 | 3306 | tname | REFERENCE | replicating | 0:2426 | |
| 10.0.3.10 | 3306 | tname_0 | MASTER | replicating | 0:4034560 | |
| 10.0.3.10 | 3306 | tname_1 | MASTER | replicating | 0:3992363 | |
| 10.0.3.10 | 3306 | tname_10 | MASTER | replicating | 0:4008804 | |
| 10.0.3.10 | 3306 | tname_11 | MASTER | replicating | 0:3968388 | |
| 10.0.3.10 | 3306 | tname_15 | MASTER | replicating | 0:4034012 | |
| 10.0.3.10 | 3306 | tname_2 | MASTER | replicating | 0:3985924 | |
| 10.0.3.10 | 3306 | tname_6 | MASTER | replicating | 0:4021956 | |
| 10.0.3.10 | 3306 | tname_9 | MASTER | replicating | 0:4011270 | |
| 10.0.3.12 | 3306 | tname | REFERENCE | replicating | 0:2426 | |
| 10.0.3.12 | 3306 | tname_12 | MASTER | replicating | 0:4001954 | |
| 10.0.3.12 | 3306 | tname_13 | MASTER | replicating | 0:3974690 | |
| 10.0.3.12 | 3306 | tname_14 | MASTER | replicating | 0:4066618 | |
| 10.0.3.12 | 3306 | tname_3 | MASTER | replicating | 0:3932348 | |
| 10.0.3.12 | 3306 | tname_4 | MASTER | replicating | 0:4029080 | |
| 10.0.3.12 | 3306 | tname_5 | MASTER | replicating | 0:4006201 | |
| 10.0.3.12 | 3306 | tname_7 | MASTER | replicating | 0:4069778 | |
| 10.0.3.12 | 3306 | tname_8 | MASTER | replicating | 0:3956332 | |
+-----------+------+----------+-----------+-------------+-----------+---------+
SHOW DATABASES EXTENDED
SHOW DATABASES EXTENDED
is another useful command for monitoring replication status. The output summarizes the replication status and various other information about the state of the databases present in a MemSQL cluster.
memsql> SHOW DATABASES EXTENDED;
+---------------------------+---------+-------------+-------------+----------+---------+-------------+------------+--------------+
| Database | Commits | Role | State | Position | Details | AsyncSlaves | SyncSlaves | Memory (MBs) |
+---------------------------+---------+-------------+-------------+----------+---------+-------------+------------+--------------+
| information_schema | 35 | master | online | 0:87170 | | 0 | 0 | 0.00 |
| memsql | 17 | master | online | 0:1039 | | 0 | 0 | 0.00 |
| sharding | 40 | master | online | 0:1759 | | 0 | 0 | 0.00 |
| sharding_6683bab69542952c | 5 | async slave | replicating | 0:320 | | 0 | 0 | 0.00 |
| t | 18 | async slave | replicating | 0:2426 | | 2 | 0 | 0.00 |
+---------------------------+---------+-------------+-------------+----------+---------+-------------+------------+--------------+
5 rows in set (0.00 sec)
Be cautious when using the SYNC option when replicating a database.
This is not supported for cross datacenter replication. The
REPLICATE DATABASE statements
will not wait for provisioning to complete before returning, even if
used with the SYNC
option. If you decide to use replication with
SYNC
, check whether all SYNC replicas of a given master are fully
synchronized (or are still in the process of catching up with the
master) by using the SHOW DATABASES command with the EXTENDED
option, specifically looking for the SyncSlaves
column. If there are
any SYNC replicas connected that have not fully synchronized yet they
will appear in parentheses within the SyncSlaves
column. For
example:
- If you used the
SYNC
command to replicate the database, you would run a command similar to the one below on the secondary host:
memsql> REPLICATE DATABASE SYNC ExampleSlave FROM root@master-host/Example;
Query OK, 1 row affected (0.32 sec)
- Run the SHOW DATABASES command with the
EXTENDED
option on the ‘master-host’ to check ifExampleSlave
is synchronized.
2.1 If
ExampleSlave
is not yet synchronized, you should see the following:memsql> SHOW DATABASES EXTENDED LIKE 'Example'; +--------------------+---------+--------+--------+-------------+---------+-------------+------------+ | Database | Commits | Role | State | Position | Details | AsyncSlaves | SyncSlaves | +--------------------+---------+--------+--------+-------------+---------+-------------+------------+ | Example | 91231 | master | online | 0:246251302 | | 0 | 0 (1) | +--------------------+---------+--------+--------+-------------+---------+-------------+------------+ 1 row in set (0.00 sec)
2.1 If
ExampleSlave
is synchronized, you should see the following:mysql> SHOW DATABASES EXTENDED LIKE 'Example'; +--------------------+---------+--------+--------+-------------+---------+-------------+------------+ | Database | Commits | Role | State | Position | Details | AsyncSlaves | SyncSlaves | +--------------------+---------+--------+--------+-------------+---------+-------------+------------+ | Example | 91231 | master | online | 0:246251302 | | 0 | 1 | +--------------------+---------+--------+--------+-------------+---------+-------------+------------+ 1 row in set (0.00 sec)
Until the synchronous replica fully synchronizes itself with the master, it will behave like an asynchronous replica.
SHOW REPLICATION STATUS
Running SHOW REPLICATION STATUS
on a node shows the status of every replication process running on that node. The following is an example of the output of SHOW REPLICATION STATUS run on secondary-MA
. Note that this example follows the naming conventions established in Setting Up Replication.
memsql> SHOW REPLICATION STATUS;
+-------------+---------------------------+--------------------------+----------------+---------------------------+-----------+-----------------+-------------+----------------+---------------+--------------+
| Role | Database | MasterURI | MasterPosition | SlaveURI | Connected | NetworkPosition | SlaveState | ReplayPosition | MasterCommits | SlaveCommits |
+-------------+---------------------------+--------------------------+----------------+---------------------------+-----------+-----------------+-------------+----------------+---------------+--------------+
| async slave | sharding_6683bab69542952c | primary-MA:3306/sharding | 0:320 | NULL | yes | 0:320 | replicating | 0:320 | 5 | 5 |
| async slave | db_name | primary-MA:3306/db_name | 0:2426 | NULL | yes | 0:2426 | replicating | 0:2426 | 18 | 18 |
| master | db_name | NULL | 0:2426 | secondary-L1:3306/db_name | yes | 0:2426 | replicating | 0:2426 | 18 | 18 |
| master | db_name | NULL | 0:2426 | secondary-L2:3306/db_name | yes | 0:2426 | replicating | 0:2426 | 18 | 18 |
+-------------+---------------------------+--------------------------+----------------+---------------------------+-----------+-----------------+-------------+----------------+---------------+--------------+
In this example, the first line describes replication of the sharding
database on primary-MA
to the sharding_6683bab69542952c
database on secondary-MA
. The sharding
database exists on the master aggregator and stores metadata that defines how data is partitioned. REPLICATE DATABASE
automatically creates a sharding_[hash]
database on the secondary cluster which stores partition metadata about the primary cluster. The second line describes replication of metadata and reference tables for the db_name
database. This data is replicated synchronously to all aggregators and asynchronously to all leaves. The third and fourth lines describe replication of db_name
metadata and reference tables from secondary-MA
to the secondary cluster’s two leaf
nodes (secondary-L1
and secondary-L2
).
NetworkPosition
uses the format
[log file ordinal]:[byte offset into log file]
.
The following is the output of SHOW REPLICATION STATUS run on
secondary-L1
. In this example, db_name_[ordinal]
refers to a partition of the sharded db_name
database.
memsql> SHOW REPLICATION STATUS;
+-------------+------------+-----------------------------+----------------+----------+-----------+-----------------+-------------+----------------+---------------+--------------+
| Role | Database | MasterURI | MasterPosition | SlaveURI | Connected | NetworkPosition | SlaveState | ReplayPosition | MasterCommits | SlaveCommits |
+-------------+------------+-----------------------------+----------------+----------+-----------+-----------------+-------------+----------------+---------------+--------------+
| async slave | db_name | secondary-MA:3306/db_name | 0:2426 | NULL | yes | 0:2426 | replicating | 0:2426 | 18 | 18 |
| async slave | db_name_0 | primary-L1:3306/db_name_0 | 0:4034560 | NULL | yes | 0:4034560 | replicating | 0:4034560 | 2 | 2 |
| async slave | db_name_1 | primary-L2:3306/db_name_1 | 0:3992363 | NULL | yes | 0:3992363 | replicating | 0:3992363 | 2 | 2 |
| async slave | db_name_10 | primary-L1:3306/db_name_10 | 0:4008804 | NULL | yes | 0:4008804 | replicating | 0:4008804 | 2 | 2 |
| async slave | db_name_11 | primary-L2:3306/db_name_11 | 0:3968388 | NULL | yes | 0:3968388 | replicating | 0:3968388 | 2 | 2 |
| async slave | db_name_15 | primary-L2:3306/db_name_15 | 0:4034012 | NULL | yes | 0:4034012 | replicating | 0:4034012 | 2 | 2 |
| async slave | db_name_2 | primary-L1:3306/db_name_2 | 0:3985924 | NULL | yes | 0:3985924 | replicating | 0:3985924 | 2 | 2 |
| async slave | db_name_6 | primary-L1:3306/db_name_6 | 0:4021956 | NULL | yes | 0:4021956 | replicating | 0:4021956 | 2 | 2 |
| async slave | db_name_9 | primary-L2:3306/db_name_9 | 0:4011270 | NULL | yes | 0:4011270 | replicating | 0:4011270 | 2 | 2 |
+-------------+------------+-----------------------------+----------------+----------+-----------+-----------------+-------------+----------------+---------------+--------------+
9 rows in set (0.01 sec)
In this sample output, the first line refers to replication of the reference database (metadata) for db_name
. This data is replicated from primary-MA
to secondary-MA
from which is is replicated to each leaf node in the secondary cluster. The remaining lines refer to replication of the partitions of the sharded database db_name
. As you can see, the data is replicated directly from leaf nodes in the primary cluster to leaf nodes in the secondary cluster. In this example, secondary-L1
is receiving data from both primary-L1
and primary-L2
.
Finally, note that MemSQL will automatically take the steps necessary to ensure the secondary cluster is consistent with the primary cluster. For example, if a leaf node in a primary cluster with redundancy 2 and a replica partition on the secondary cluster gets ahead of a replica partition on the primary cluster (due to network or other irregularity), MemSQL will automatically drop and reprovision the replica partition on the secondary cluster such that it will be consistent with the recently promoted master partition on the primary cluster.
Failover and Promotion
There are a number of failure cases to consider when discussing MemSQL replication.
Primary Cluster Master Aggregator Failure
If the master aggregator for the primary cluster fails, replication of reference and metadata will stop, but distributed tables will continue replicating because this data flows directly from leaves on the primary cluster to leaves on the secondary cluster. To resume replication of reference and metadata, do the following:
- Pause replication from the master aggregator of the secondary cluster using
PAUSE REPLICATING db_name
. - Promote a child aggregator in the primary cluster to master. For more information see AGGREGATOR SET AS MASTER.
- Resume replication from the master aggregator of the secondary cluster pointed at the new primary cluster master aggregator using the following command:
memsql> CONTINUE REPLICATING db_name [ASYNC | SYNC] FROM primary_user[:primary_password]@primary-MA[:port][/db_name] [FORCE]
Secondary Cluster Master Aggregator Failure
If the master aggregator for the secondary cluster fails, like the previous failure case, replication of reference tables and metadata will stop, but distributed tables will continue replicating because this data flows directly from leaves on the primary cluster to leaves on the secondary cluster. To resume replication of reference and metadata, do the following:
- Promote a child aggregator in the secondary cluster to master. For more information see AGGREGATOR SET AS MASTER.
- Pause replication from the new master aggregator of the secondary cluster using
PAUSE REPLICATING db_name
. - Resume replication from the new master aggregator of the secondary cluster pointed at the primary cluster master aggregator using the following command:
memsql> CONTINUE REPLICATING db_name [ASYNC | SYNC] FROM primary_user[:primary_password]@primary-MA[:port][/db_name] [FORCE]
Failover to a Secondary Cluster
If the primary cluster fails and it becomes necessary to failover to the secondary cluster, do the following:
- Stop replication by running
STOP REPLICATING db_name
on the secondary cluster master aggregator. The secondary cluster is now a “full” MemSQL cluster has full read, write, DDL, and DML capabilities. - Re-point your application at an aggregator in the secondary cluster. Note that, after running
STOP REPLICATING
, you cannot resume replicating from the primary cluster.
Tuning Replication
MemSQL Replication is tuned by default to be efficient and fast for most workloads. However, MemSQL exposes several configuration hooks to help you fine-tune performance.
Snapshot and Log Files
MemSQL replication works by transferring snapshot and log files from master to replica. When replication is initiated, the replica requests the snapshot file from the master and proceeds to provision from the snapshot. Once provisioning is complete, replication proceeds by shipping transactions directly from the log file.
As described in durability_configuration, MemSQL exposes the snapshots-to-keep
variable to let you tune how many old versions of the snapshot and log files to keep around. MemSQL will automatically delete files that fall out of this window. In the context of replication, if a network outage causes the replica to fall so behind that its position in the log has been rotated out, then it is re-provisioned from the current snapshot so that replication can proceed from an existing log file.
You can tune snapshot-trigger-size
and snapshots-to-keep
to optimize the server for your network. A larger snapshot-trigger-size
increases the length of each log and therefore offers more tolerance in the event of a sluggish network. However, it will decrease the frequency at which
snapshots are taken and increase MemSQL recovery time due to larger logs (snapshot recovery is parallel, log recovery is single threaded).
By increasing snapshots-to-keep
, you can effectively increase how long log files are kept around. If you increase these parameters, make sure to allocate enough disk space to account for the larger
(snapshot-trigger-size
) and extra (snapshots-to-keep
) files. If a replica partition falls more than snapshots-to-keep
(which is a positive integer) behind the master partition, primary cluster master aggregator will automatically reprovision that replica partition.
Using mysqldump to Extract Data From a Secondary database
When mysqldump
is run on a secondary database following the instructions in Exporting Data From MemSQL, an error will occur. This error happens because mysqldump
runs LOCK TABLES
which isn’t permitted on a secondary database. mysqldump
can be configured to avoid
locking tables by passing the option --lock-tables=false
. So, to take a consistent mysqldump
of a secondary database called secondary_db
we recommend the following:
Note that pausing replication is only required if you want a consistent mysqldump
when concurrent writes are happening on the master.