Outdated Version

You are viewing an older version of this section. View current production version.

Installation Best Practices min read


This section lists the best practices for properly configuring your MemSQL installation.

MemSQL Database Configuration

Configure Linux ulimit settings

Most Linux operating systems provide ways to control the usage of system resources such as threads, files and network at an individual user or process level. The per-user limitations for resources are called ulimits, and they prevent single users from consuming too much system resources. For optimal performance, MemSQL recommends setting ulimits to higher values than the default Linux settings. The ulimit settings can be configured in the /etc/security/limits.conf file, or directly via shell commands.

Increase File Descriptor Limit

The MemSQL cluster uses a substantial number of client and server connections between aggregators and leaves to run queries and cluster operations. We recommend setting the Linux file descriptor limit to the highest possible value (at least 64,000) to account for these connections. Failing to increase this limit can significantly degrade performance and even cause connection limit errors.

Permanently increase the open files limit for all users by editing the /etc/security/limits.conf file as root, and adding the lines:

* soft NOFILE 1000000
* hard NOFILE 1000000

Then reboot the host and verify the open files limit with ulimits -a and cat /proc/memsqld_pid/limits

Check the value of cat /proc/sys/fs/file-max. If fs.file-max is less than 1000000 permanently increase it by editing the /etc/sysctl.conf file as root, and adding the lines:

fs.file-max = 1000000

Then reboot the host and verify the value of fs.file-max using cat /proc/sys/fs/file-max

For more information about setting the file descriptor limit on Linux, see: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files

Increase Maximum Process Limit

MemSQL also recommends increasing the number of processes allowed to be run by users.

Permanently increase the max user processes limit for all users by editing the /etc/security/limits.conf file as root, and adding the lines:

* soft NPROC 128000
* hard NPROC 128000

Then reboot the host and verify the value of max user processes using ulimit -a and cat /proc/memsqld_pid/limits

Configure Linux vm settings

MemSQL recommends setting the following settings via sysctl to minimize likelihood of getting memory errors. You can use the /sbin/sysctl command to view, set and automate settings in the /proc/sys/ directory.

Permanently set these variables by editing the /etc/sysctl.conf file as root, and adding these lines:

vm.max_map_count=1000000000
vm.min_free_kbytes=500000
vm.swappiness=10
Warning

If MemSQL Ops is used to configure vm setting defaults, it will set vm.min_free_kbytes to the smaller of 1% of system RAM or 4 GB.

Disable transparent huge pages

MemSQL recommends disabling transparent huge pages (THP). Not doing so may result in inconsistent query run times. This troubleshooting article explains how to disable THP.

Create swap files

It is recommended that you create a swap partition (or swap file on a dedicated device) to serve as an emergency backing store for RAM. MemSQL makes extensive use of RAM (especially with rowstore tables), so it is important that the operating system does not immediately start killing processes if MemSQL runs out of memory. Because typical machines running MemSQL have a large amount of RAM (>32 GB/node), the swap space can be relatively small. For example, Ubuntu recommends 6 GB of swap space for 32 GB of RAM and RedHat recommends at least 4 GB of swap space for 8 - 64 GB of RAM.

For more information on size recommendations and to configure swap space, please refer to your distribution’s documentation.

Ensure port 3306 is available on every cluster host, or change the MemSQL port assignment

The default port used by MemSQL is 3306, which is configurable.

If another process is already running on the system and occupying port 3306, you will see an error in MemSQL similar to:

120501  3:04:15 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use 
120501  3:04:15 [ERROR] Do you already have another mysqld server running on port: 3306 ?
120501  3:04:15 [ERROR] Aborting

To workaround this, update the MemSQL port value in the memsql.cnf file to an available port. For more information on changing the port assignment, see the MemSQL Configuration page.

Thereafter, connect to the database using the new port, i.e. mysql -u root -h 127.0.0.1 -P <PORT> --prompt="memsql> ".

Recommendations for Optimal On-Premise Columnstore Performance

  • We support the EXT4 and XFS filesystems.
  • Many improvements have been made recently in Linux for NVMe devices, so we recommend using a 3.0+ series kernel. For example, CentOS 7.2 uses the 3.10 kernel.
  • Set the following parameters in Linux (make it permanent in /etc/rc.local):
# Set ${DEVICE_NUMBER} for each device
echo 0 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/add_random
echo 1 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme${DEVICE_NUMBER}n1/queue/scheduler
echo 1023 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests

MemSQL Ops Configuration

Disable requiretty

For the smoothest user experience of automatically provisioning the MemSQL cluster, MemSQL Ops needs the ability to SSH into other hosts outside an interactive shell/session. As such, it requires that requiretty is disabled. This can be done by modifying the /etc/sudoers file and commenting out any lines that reference requiretty.

Ensure working SSH between cluster hosts

For MemSQL Agent to successfully download and install itself in other hosts in the cluster, it needs to connect via SSH from the primary host (where the primary MemSQL Agent resides) to other hosts in the cluster. For the best user experience, check to make sure SSH is possible between the primary host and the other hosts in the cluster. If this is not possible, you will need to install the MemSQL Agent manually in all cluster hosts.

Start a new login shell after running install.sh

The install.sh script creates a memsql user and memsql group within the Linux machine, and adds the user running install.sh to the memsql group. To successfully complete the addition of your user into the memsql group, you will need to restart your shell.

Create the same Linux user with sudo permissions on every cluster host

Setting up MemSQL using the MemSQL Agent is easiest if the user and password is the same for every node. The MemSQL Agent will use this user to connect to all hosts in the cluster and install / deploy MemSQL.

Ensure port 9000 is accessible on every cluster host, or change the MemSQL Ops port assignment

The default port opened by the MemSQL Ops is 9000, which is configurable. For the smoothest user experience, open this port, or the port of your choosing, on every host, and ensure it is accessible via a browser. For deployments on public cloud platforms (e.g. AWS, Azure), make sure the proper security group is configured for your cluster hosts that allow public access to port 9000 or your chosen port.

If you would like to change the port assignment from the default 9000, MemSQL Ops agents can be started on other ports using the --port flag in memsql-ops start.

Configure MemSQL for NUMA on machines with multiple NUMA nodes

For optimum performance on multi-socket processors, you should set up and configure MemSQL for Non-Uniform Memory Access (NUMA) as part of your installation process. See Configuring MemSQL for Non-Uniform Memory Access (NUMA) for more information on how to do this.

Ensure /tmp has free space or change TMPDIR

MemSQL Ops, like many Unix utilities, writes temporary data to /tmp and requires available free space.

It is possible to change the temporary directory by setting the canonical Unix environment variable TMPDIR.