Outdated Version

You are viewing an older version of this section. View current production version.

Introduction min read


Installing MemSQL on bare metal or on virtual machines can be done through the use of popular configuration management tools, such as CloudFormation, or through MemSQL’s management tools.

In this tutorial, you will deploy a MemSQL cluster onto physical or virtual machines and connect to the cluster using our monitoring, profiling, and debugging tool, MemSQL Studio.

A four-node cluster is the minimal recommended cluster size for showcasing MemSQL as a distributed, highly-available database; however, you can use the procedures in this tutorial to scale out to additional nodes for increased performance over large data sets or to handle higher concurrency loads. To learn more about MemSQL’s design principles and topology concepts, see Distributed Architecture.

You will learn how to deploy a cluster using APT or YUM packages through a manual set of steps that allows for customization, such as binding to multiple NUMA nodes. If you use APT or YUM packages, but do not have strict requirements on installation paths or other customization requirements, you can deploy a cluster using a single command by going through the Basic Install Guide instead.

Info

There are no licensing costs for using up to four license units for the leaf nodes in your cluster. If you need a larger cluster with more/larger leaf nodes, please create an Enterprise License trial key.

Prerequisites

For this tutorial you will need:

  • Physical or virtual machines (or “hosts”) with the following:

    • At least four (4) x86_64 CPU cores and eight GB of RAM per machine (8 vCPU and 32 GB of RAM is recommended for leaf nodes to align with license unit calculations)

    • Running 64-bit version of RHEL/CentOS (6 or higher) or Debian (8 or higher)

    • Port 3306 open on all host hosts for intra-cluster communication

    • Port 8080 open on the main deployment host for the cluster

    • A non-root user with sudo privileges available on all hosts in the cluster that be used to run MemSQL services and own the corresponding runtime state

  • SSH access to all hosts (installing and using ssh-agent is recommended for SSH keys with passwords)

    • If using SSH keys, make sure the identity key used on the main deployment host can be used to log into to the other hosts.

    • Refer to How to Setup Passwordless SSH Login for more information on using SSH without a password.

  • A connection to the Internet to download required packages

If running this in a production environment, it is highly recommended that you follow our host configuration recommendations for optimal cluster performance.

Duplicate Hosts

As of MemSQL Toolbox 1.4.4, a check for duplicate hosts is performed before MemSQL is deployed, and will display a message similar to the following if more than one host has the same SSH host key:

✘ Host check failed. host 172.26.212.166 has the same ssh
host keys as 172.16.212.165, toolbox doesn't support
registering the same host twice

Confirm that all specified hosts are indeed different and aren’t using identical SSH host keys. Identical host keys can be present if you have instantiated your host instances from images (AMIs, snapshots, etc.) that contain existing host keys. When a host is cloned, the host key (typically stored in /etc/ssh/ssh_host_<cipher>_key) will also be cloned.

As each cloned host will have the same host key, an SSH client cannot verify that it is connecting to the intended host. The script that deploys MemSQL will interpret a duplicate host key as an attempt to deploy to the same host twice, and the deployment will fail.

The CentOS 7.x steps below demonstrate a potential remedy for the “duplicate hosts” message.

$ sudo root
# ls -al /etc/ssh/
# rm /etc/ssh/<your-ssh-host-keys>
# ssh-keygen -f /etc/ssh/<ssh-host-key-filename> -N '' -t rsa1
# ssh-keygen -f /etc/ssh/<ssh-host-rsa-key-filename> -N '' -t rsa
# ssh-keygen -f /etc/ssh/<ssh-host-dsa-key-filename> -N '' -t dsa

For more information about SSH host keys, including the equivalent steps for Ubuntu-based systems, refer to Avoid Duplicating SSH Host Keys.

As of MemSQL Toolbox 1.5.3, memsql-deploy setup-cluster supports an --allow-duplicate-host-fingerprints option that can be used to ignore duplicate SSH host keys.

Network Configuration

Depending on the host and its function in deployment, some or all of the following port settings should be enabled on hosts in your cluster.

These routing and firewall settings must be configured to:

  • Allow database clients (e.g. your application) to connect to the MemSQL aggregators

  • Allow all nodes in the cluster to talk to each other over the MemSQL protocol (3306)

  • Allow you to connect to management and monitoring tools

Protocol Default Port Direction Description
TCP 22 Inbound and Outbound For host access. Required between nodes in MemSQL tool deployment scenarios. Also useful for remote administration and troubleshooting on the main deployment host.
TCP 443 Outbound To get public repo key for package verification. Required for nodes downloading MemSQL APT or YUM packages.
TCP 3306 Inbound and Outbound Default port used by MemSQL. Required on all nodes for intra-cluster communication. Also required on aggregators for client connections.
TCP 8080 Inbound and Outbound Default port for MemSQL Studio. (Only required for the host running Studio.)

The service port values are configurable if the default values cannot be used in your deployment environment. For more information on how to change them, see:

We also highly recommend configuring your firewall to prevent other hosts on the Internet from connecting to MemSQL.