Outdated Version

You are viewing an older version of this section. View current production version.

Installation Best Practices min read


This section lists the best practices for properly configuring your MemSQL installation.

MemSQL Database Configuration

Configure Linux ulimit settings

Most Linux operating systems provide ways to control the usage of system resources such as threads, files and network at an individual user or process level. The per-user limitations for resources are called ulimits, and they prevent single users from consuming too much system resources. For optimal performance, MemSQL recommends setting ulimits to higher values than the default Linux settings. The ulimit settings can be configured in the /etc/security/limits.conf file, or directly via shell commands.

Increase File Descriptor Limit

The MemSQL cluster uses a substantial number of client and server connections between aggregators and leaves to run queries and cluster operations. We recommend setting the Linux file descriptor limit to the highest possible value (at least 64,000) to account for these connections. Failing to increase this limit can significantly degrade performance and even cause connection limit errors.

Permanently increase this limit for all users by editing the /etc/security/limits.conf file as root, and adding the lines:

* soft NOFILE 1000000
* hard NOFILE 1000000

Alternatively, you can set the value for your session by running the following command in your shell:

ulimit -n 1000000

For more information about setting the file descriptor limit on Linux, see: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files.

Increase Maximum Process Limit

MemSQL also recommends increasing the number of processes allowed to be run by users.

Permanently increase this limit for all users by editing the /etc/security/limits.conf file as root, and adding the lines:

* soft NPROC 128000
* hard NPROC 128000

Alternatively, you can set the value for your session by running the following command in your shell:

ulimit -u 128000

Configure Linux vm settings

MemSQL recommends setting the following settings via sysctl to minimize likelihood of getting memory errors. You can use the /sbin/sysctl command to view, set and automate settings in the /proc/sys/ directory.

Permanently set these variables by editing the /etc/sysctl.conf file as root, and adding these lines:

vm.max_map_count=1000000000
vm.min_free_kbytes=500000
vm.swappiness=10
Warning

If MemSQL Ops is used to configure vm setting defaults, it will set vm.min_free_kbytes to the smaller of 1% of system RAM or 4 GB.

Disable transparent huge pages

MemSQL recommends disabling transparent huge pages (THP). Not doing so may result in inconsistent query run times. This troubleshooting article explains how to disable THP.

Create swap files

It is recommended that users explicitly configure swap files.

It is recommended that you create a swap file so that the operating system does not immediately start killing processes if MemSQL runs out of memory. Here is a sample shell script to create the swap file:

snapshot_dir="/var/lib/memsql/data/snapshots"
snapshot_file="$snapshot_dir/memsql.swp"
mkdir -p $snapshot_dir
dd if=/dev/zero of=$snapshot_file bs=1M count=10240
chmod 600 $snapshot_file
mkswap $snapshot_file
swapon $snapshot_file
echo "$snapshot_file swap swap defaults 0 0" | tee -a /etc/fstab

For more information on configuring swap files, see: http://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/.

Ensure port 3306 is available on every cluster host

The default port used by MemSQL is 3306, which is configurable. For the smoothest user experience, make sure this port is accessible and unoccupied.

If another process is already running on the system and occupying port 3306, you will see an error in MemSQL similar to:

120501  3:04:15 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use 
120501  3:04:15 [ERROR] Do you already have another mysqld server running on port: 3306 ?
120501  3:04:15 [ERROR] Aborting

To workaround this, update the MemSQL port value in the memsql.cnf file to an available port.

Thereafter, connect to the database using the new port, i.e. mysql -u root -h 127.0.0.1 -P <PORT> --prompt="memsql> ".

Recommendations for Optimal On-Premise Columnstore Performance

  • We recommend the P3600 or P3700 Intel SSDs. Update the devices to the latest firmware using the Intel SSD Data Center Tool.
  • We support the EXT4 filesystem using the discard and noatime mount options. We currently do not support XFS.
  • Many improvements have been made recently in Linux for NVMe devices, so we recommend using a 3.0+ series kernel. For example, CentOS 7.2 uses the 3.10 kernel.
  • Set the following parameters in Linux (make it permanent in /etc/rc.local):
# Set ${DEVICE_NUMBER} for each device
echo 0 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/add_random
echo 1 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme${DEVICE_NUMBER}n1/queue/scheduler
echo 1023 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests

MemSQL Ops Configuration

Disable requiretty

For the smoothest user experience of automatically provisioning the MemSQL cluster, MemSQL Ops needs the ability to SSH into other hosts outside an interactive shell/session. As such, it requires that requiretty is disabled. This can be done by modifying the /etc/sudoers file and commenting out any lines that reference requiretty.

Ensure working SSH between cluster hosts

For MemSQL Agent to successfully download and install itself in other hosts in the cluster, it needs to connect via SSH from the primary host (where the primary MemSQL Agent resides) to other hosts in the cluster. For the best user experience, check to make sure SSH is possible between the primary host and the other hosts in the cluster. If this is not possible, you will need to install the MemSQL Agent manually in all cluster hosts.

Start a new login shell after running install.sh

The install.sh script creates a memsql user and memsql group within the Linux machine, and adds the user running install.sh to the memsql group. To successfully complete the addition of your user into the memsql group, you will need to restart your shell.

Create the same Linux user with sudo permissions on every cluster host

Setting up MemSQL using the MemSQL Agent is easiest if the user and password is the same for every node. The MemSQL Agent will use this user to connect to all hosts in the cluster and install / deploy MemSQL.

Ensure port 9000 is accessible on every cluster host

The default port opened by the MemSQL Ops is 9000, which is configurable. For the smoothest user experience, open this port on every host, and ensure it is accessible via a browser. For deployments on public cloud platforms (e.g. AWS, Azure), make sure the proper security group is configured for your cluster hosts that allow public access to port 9000.

If it is not possible to open port 9000, MemSQL Ops agents can be started on other ports using the --port flag in memsql-ops start.

Install numactl on machines with multiple sockets

For optimal performance, MemSQL Ops automatically detects if a machine has multiple sockets and recommends that MemSQL be deployed in a NUMA-aware manner. Specifically, MemSQL Ops will run numactl commands to bind individual MemSQL nodes to CPUs. This allows faster access to in-memory data, since individual MemSQL nodes only access data that’s collocated with their corresponding CPU. For MemSQL Ops to enable NUMA, you need to install the numactl package in your Linux host:

sudo apt-get install numactl

Ensure /tmp has free space or change TMPDIR

MemSQL Ops, like many Unix utilities, writes temporary data to /tmp and requires available free space.

It is possible to change the temporary directory by setting the canonical Unix environment variable TMPDIR.