This section lists the best practices for properly configuring your MemSQL installation.
MemSQL Database Configuration
Configure Linux ulimit
settings
Most Linux operating systems provide ways to control the usage of system resources such as threads, files and network at an individual user or process level. The per-user limitations for resources are called ulimits, and they prevent single users from consuming too much system resources. For optimal performance, MemSQL recommends setting ulimits to higher values than the default Linux settings. The ulimit settings can be configured in the /etc/security/limits.conf
file, or directly via shell commands.
Increase File Descriptor Limit
The MemSQL cluster uses a substantial number of client and server connections between aggregators and leaves to run queries and cluster operations. We recommend setting the Linux file descriptor limit to the highest possible value (at least 64,000) to account for these connections. Failing to increase this limit can significantly degrade performance and even cause connection limit errors.
Permanently increase this limit for all users by editing the /etc/security/limits.conf
file as root, and adding the lines:
* soft NOFILE 1000000
* hard NOFILE 1000000
Alternatively, you can set the value for your session by running the following command in your shell:
ulimit -n 1000000
For more information about setting the file descriptor limit on Linux, see: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files
.
Increase Maximum Process Limit
MemSQL also recommends increasing the number of processes allowed to be run by users.
Permanently increase this limit for all users by editing the /etc/security/limits.conf
file as root, and adding the lines:
* soft NPROC 128000
* hard NPROC 128000
Alternatively, you can set the value for your session by running the following command in your shell:
ulimit -u 128000
Configure Linux vm
settings
MemSQL recommends setting the following settings via sysctl
to minimize likelihood of getting memory errors. You can use the /sbin/sysctl
command to view, set and automate settings in the /proc/sys/
directory.
Permanently set these variables by editing the /etc/sysctl.conf
file as root, and adding these lines:
vm.max_map_count=1000000000
vm.min_free_kbytes=500000
vm.swappiness=10
If MemSQL Ops is used to configure vm setting defaults, it will set vm.min_free_kbytes
to the smaller of 1% of system RAM or 4 GB.
Disable transparent huge pages
MemSQL recommends disabling transparent huge pages (THP). Not doing so may result in inconsistent query run times. This troubleshooting article explains how to disable THP.
Create swap files
It is recommended that users explicitly configure swap files.
It is recommended that you create a swap file so that the operating system does not immediately start killing processes if MemSQL runs out of memory. Here is a sample shell script to create the swap file:
snapshot_dir="/var/lib/memsql/data/snapshots"
snapshot_file="$snapshot_dir/memsql.swp"
mkdir -p $snapshot_dir
dd if=/dev/zero of=$snapshot_file bs=1M count=10240
chmod 600 $snapshot_file
mkswap $snapshot_file
swapon $snapshot_file
echo "$snapshot_file swap swap defaults 0 0" | tee -a /etc/fstab
For more information on configuring swap files, see: http://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/
.
Ensure port 3306 is available on every cluster host
The default port used by MemSQL is 3306, which is configurable. For the smoothest user experience, make sure this port is accessible and unoccupied.
If another process is already running on the system and occupying port 3306, you will see an error in MemSQL similar to:
120501 3:04:15 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
120501 3:04:15 [ERROR] Do you already have another mysqld server running on port: 3306 ?
120501 3:04:15 [ERROR] Aborting
To workaround this, update the MemSQL port
value in the memsql.cnf
file to an available port.
Thereafter, connect to the database using the new port, i.e. mysql -u root -h 127.0.0.1 -P <PORT> --prompt="memsql> "
.
Recommendations for Optimal On-Premise Columnstore Performance
- We recommend the P3600 or P3700 Intel SSDs. Update the devices to the latest firmware using the Intel SSD Data Center Tool.
- We support the EXT4 filesystem using the
discard
andnoatime
mount options. We currently do not support XFS. - Many improvements have been made recently in Linux for NVMe devices, so we recommend using a 3.0+ series kernel. For example, CentOS 7.2 uses the 3.10 kernel.
- Set the following parameters in Linux (make it permanent in
/etc/rc.local
):
# Set ${DEVICE_NUMBER} for each device
echo 0 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/add_random
echo 1 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme${DEVICE_NUMBER}n1/queue/scheduler
echo 1023 > /sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests
MemSQL Ops Configuration
Disable requiretty
For the smoothest user experience of automatically provisioning the MemSQL cluster, MemSQL Ops needs the ability to SSH into other hosts outside an interactive shell/session. As such, it requires that requiretty
is disabled. This can be done by modifying the /etc/sudoers
file and commenting out any lines that reference requiretty
.
Ensure working SSH between cluster hosts
For MemSQL Agent to successfully download and install itself in other hosts in the cluster, it needs to connect via SSH from the primary host (where the primary MemSQL Agent resides) to other hosts in the cluster. For the best user experience, check to make sure SSH is possible between the primary host and the other hosts in the cluster. If this is not possible, you will need to install the MemSQL Agent manually in all cluster hosts.
Start a new login shell after running install.sh
The install.sh
script creates a memsql
user and memsql
group within the Linux machine, and adds the user running install.sh
to the memsql
group. To successfully complete the addition of your user into the memsql
group, you will need to restart your shell.
Create the same Linux user with sudo permissions on every cluster host
Setting up MemSQL using the MemSQL Agent is easiest if the user and password is the same for every node. The MemSQL Agent will use this user to connect to all hosts in the cluster and install / deploy MemSQL.
Ensure port 9000 is accessible on every cluster host
The default port opened by the MemSQL Ops is 9000, which is configurable. For the smoothest user experience, open this port on every host, and ensure it is accessible via a browser. For deployments on public cloud platforms (e.g. AWS, Azure), make sure the proper security group is configured for your cluster hosts that allow public access to port 9000.
If it is not possible to open port 9000, MemSQL Ops agents can be started on other ports using the --port
flag in memsql-ops start
.
Install numactl on machines with multiple sockets
For optimal performance, MemSQL Ops automatically detects if a machine has multiple sockets and recommends that MemSQL be deployed in a NUMA-aware manner. Specifically, MemSQL Ops will run numactl
commands to bind individual MemSQL nodes to CPUs. This allows faster access to in-memory data, since individual MemSQL nodes only access data that’s collocated with their corresponding CPU. For MemSQL Ops to enable NUMA, you need to install the numactl
package in your Linux host:
sudo apt-get install numactl
Ensure /tmp
has free space or change TMPDIR
MemSQL Ops, like many Unix utilities, writes temporary data to /tmp
and requires available free space.
It is possible to change the temporary directory by setting the canonical Unix environment variable TMPDIR
.