This section lists the best practices for properly configuring your MemSQL installation.
MemSQL Database Configuration
Configure Linux ulimit
settings
Most Linux operating systems provide ways to control the usage of system resources such as threads, files and network at an individual user or process level. The per-user limitations for resources are called ulimits, and they prevent single users from consuming too much system resources. For optimal performance, MemSQL recommends setting ulimits to higher values than the default Linux settings. The ulimit settings can be configured in the /etc/security/limits.conf
file, or directly via shell commands.
Increase File Descriptor Limit
The MemSQL cluster uses a substantial number of client and server connections between aggregators and leaves to run queries and cluster operations. We recommend setting the Linux file descriptor limit to the highest possible value (at least 64,000) to account for these connections. Failing to increase this limit can significantly degrade performance and even cause connection limit errors.
Permanently increase the open files limit for all users by editing the /etc/security/limits.conf
file as root, and adding the lines:
* soft NOFILE 1000000
* hard NOFILE 1000000
Then reboot the host and verify the open files limit with ulimits -a
and cat /proc/memsqld_pid/limits
Check the value of cat /proc/sys/fs/file-max
. If fs.file-max
is less than 1000000 permanently increase it by editing the /etc/sysctl.conf
file as root, and adding the lines:
fs.file-max = 1000000
Then reboot the host and verify the value of fs.file-max
using cat /proc/sys/fs/file-max
For more information about setting the file descriptor limit on Linux, see: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files
Increase Maximum Process Limit
MemSQL also recommends increasing the number of processes allowed to be run by users.
Permanently increase the max user processes limit for all users by editing the /etc/security/limits.conf
file as root, and adding the lines:
* soft NPROC 128000
* hard NPROC 128000
Then reboot the host and verify the value of max user processes using ulimit -a
and cat /proc/memsqld_pid/limits
Configure Linux vm
settings
MemSQL recommends setting the following settings via sysctl
to minimize likelihood of getting memory errors. You can use the /sbin/sysctl
command to view, set and automate settings in the /proc/sys/
directory.
Permanently set these variables by editing the /etc/sysctl.conf
file as root, and adding these lines:
vm.max_map_count=1000000000
vm.min_free_kbytes=500000
vm.swappiness=10
If MemSQL Ops is used to configure vm setting defaults, it will set vm.min_free_kbytes
to the smaller of 1% of system RAM or 4 GB.
Disable transparent huge pages
MemSQL recommends disabling transparent huge pages (THP). Not doing so may result in inconsistent query run times. This troubleshooting article explains how to disable THP.
Create swap files
It is recommended that you create a swap partition (or swap file on a dedicated device) to serve as an emergency backing store for RAM. MemSQL makes extensive use of RAM (especially with rowstore tables), so it is important that the operating system does not immediately start killing processes if MemSQL runs out of memory. Because typical machines running MemSQL have a large amount of RAM (>32 GB/node), the swap space can be relatively small. For example, Ubuntu recommends 6 GB of swap space for 32 GB of RAM and RedHat recommends at least 4 GB of swap space for 8 - 64 GB of RAM.
For more information on size recommendations and to configure swap space, please refer to your distribution’s documentation.
Ensure port 3306 is available on every cluster host, or change the MemSQL port assignment
The default port used by MemSQL is 3306, which is configurable.
If another process is already running on the system and occupying port 3306, you will see an error in MemSQL similar to:
120501 3:04:15 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
120501 3:04:15 [ERROR] Do you already have another mysqld server running on port: 3306 ?
120501 3:04:15 [ERROR] Aborting
To workaround this, update the MemSQL port
value in the memsql.cnf
file to an available port. For more information on changing the port assignment, see the MemSQL Configuration page.
Thereafter, connect to the database using the new port, i.e. mysql -u root -h 127.0.0.1 -P <PORT> --prompt="memsql> "
.
Recommendations for Optimal On-Premise Columnstore Performance
- We support the EXT4 and XFS filesystems.
- Many improvements have been made recently in Linux for NVMe devices, so we recommend using a 3.0+ series kernel. For example, CentOS 7.2 uses the 3.10 kernel.
- Set the following parameters in Linux (make it permanent in
/etc/rc.local
):
# Set ${DEVICE_NUMBER} for each device
echo 0 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/add_random
echo 1 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme${DEVICE_NUMBER}n1/queue/scheduler
echo 1023 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests
MemSQL Ops Configuration
Disable requiretty
For the smoothest user experience of automatically provisioning the MemSQL cluster, MemSQL Ops needs the ability to SSH into other hosts outside an interactive shell/session. As such, it requires that requiretty
is disabled. This can be done by modifying the /etc/sudoers
file and commenting out any lines that reference requiretty
.
Ensure working SSH between cluster hosts
For MemSQL Agent to successfully download and install itself in other hosts in the cluster, it needs to connect via SSH from the primary host (where the primary MemSQL Agent resides) to other hosts in the cluster. For the best user experience, check to make sure SSH is possible between the primary host and the other hosts in the cluster. If this is not possible, you will need to install the MemSQL Agent manually in all cluster hosts.
Start a new login shell after running install.sh
The install.sh
script creates a memsql
user and memsql
group within the Linux machine, and adds the user running install.sh
to the memsql
group. To successfully complete the addition of your user into the memsql
group, you will need to restart your shell.
Create the same Linux user with sudo permissions on every cluster host
Setting up MemSQL using the MemSQL Agent is easiest if the user and password is the same for every node. The MemSQL Agent will use this user to connect to all hosts in the cluster and install / deploy MemSQL.
Ensure port 9000 is accessible on every cluster host, or change the MemSQL Ops port assignment
The default port opened by the MemSQL Ops is 9000, which is configurable. For the smoothest user experience, open this port, or the port of your choosing, on every host, and ensure it is accessible via a browser. For deployments on public cloud platforms (e.g. AWS, Azure), make sure the proper security group is configured for your cluster hosts that allow public access to port 9000 or your chosen port.
If you would like to change the port assignment from the default 9000, MemSQL Ops agents can be started on other ports using the --port
flag in memsql-ops start
.
Configure MemSQL for NUMA on machines with multiple NUMA nodes
For optimum performance on multi-socket processors, you should set up and configure MemSQL for Non-Uniform Memory Access (NUMA) as part of your installation process. See Configuring MemSQL for Non-Uniform Memory Access (NUMA) for more information on how to do this.
Ensure /tmp
has free space or change TMPDIR
MemSQL Ops, like many Unix utilities, writes temporary data to /tmp
and requires available free space.
It is possible to change the temporary directory by setting the canonical Unix environment variable TMPDIR
.