Outdated Version

You are viewing an older version of this section. View current production version.

SingleStore DB FAQ

This page addresses some frequently asked questions about SingleStore DB.

General

What are license units and how do they apply to my cluster?

Previous versions of SingleStore DB used available RAM for pricing and to define the boundary between free and enterprise licenses. If you already have one of these RAM-based licenses, you can continue to use it when upgrading to newer versions of SingleStore DB; however, MemSQL 6.7.13 and later now support licenses based on the concept of units. All new licenses generated in the SingleStore Customer Portal will be unit-based licenses.

A unit is a combination of compute (8 vCPU or less) and RAM (32 GB or less) and only applies to host machines that run leaf nodes in a cluster. Note: SingleStore DB counts the number of virtual cores available to a processor, regardless of whether hyper-threading is enabled or not.

The size of the host machine (physical or virtual) determines how many license units will be counted when you add leaf nodes to that host machine to your cluster. The total license units for a host machine are calculated by taking the total RAM of the host machine divided by 32, or the total number of vCPUs divided by 8, and rounding up. Whichever is the greater value (RAM or vCPU) is the number of potential license units on that host machine. Because this only applies to leaf nodes in your cluster, aggregator nodes are not part of the license unit calculation and can be deployed on any quantity or size of host machines that meet the minimum hardware requirements for installation.

To help understand how license units are calculated, the following table provides examples of how many license units would be counted for various AWS EC2 instance types:

Host Size RAM vCPU License Units
m5.2xlarge 32 8 1
c5.2xlarge 16 8 1
i3.2xlarge 61 8 2
r5.2xlarge 64 8 2
m5.4xlarge 64 16 2
m4.10xlarge 160 40 5

You can artificially constrain or limit the number of license units to a subset of the resources available by lowering the resources available to a host machine that you are deploying a leaf node onto. For example, if you have a physical machine with 16 vCPU and 48 GB of RAM, SingleStore DB will consider you to have two potential license units if you used that machine as a host. To artificially constrain your capacity, deploy SingleStore DB onto a virtual machine instead with fewer resources to lower your license unit count per host machine (which would be considered the virtual machine).

Where can I go for help if I do not have access to paid support?

  1. SingleStore Forums. To ask questions and share knowledge with other members of the community, join our public forums at www.singlestore.com/forum. SingleStore employees will also monitor the forums during business hours in California.

  2. Ask StackOverflow. We and other SingleStore DB experts are active on the StackOverflow community. If you can’t find what you’re looking for in our documentation, just ask a question. Make sure to tag it with #singlestore!

What is the relationship between SingleStore DB and MySQL?

SingleStore DB is a standalone distributed database that is compatible with MySQL client software.

SingleStore DB is not a storage engine or other component of MySQL, nor is MySQL a component of SingleStore DB, and the SingleStore DB and MySQL servers do not share any code. The SingleStore DB server includes its own storage engine and SQL-based execution engine built around scalable distributed execution, lock-free data structures, and machine code generation.

Users use MySQL client software to connect to the SingleStore DB server. SingleStore DB uses the same wire protocol as MySQL and supports similar SQL syntax as MySQL for compatibility with MySQL client software.

Why do I see the MemSQL version as 5.5.8, even though that’s not the version of MemSQL I installed?

For compatibility with MySQL client software, MemSQL reports the version engine variable as 5.5.8. Client drivers look for this version to determine how to interact with the server.

This is not the SingleStore DB server version. The SingleStore DB server version can be found in the memsql_version engine variable, which can be queried like SELECT @@memsql_version.

As a result, certain MySQL clients may display the server version as 5.5.8, even though this is not the MemSQL version. For example, the mysql command line client may display

Server version: 5.5.8 MemSQL source distribution (compatible; MySQL Enterprise & MySQL Commercial)

when you connect to SingleStore DB.

Is SingleStore DB vulnerable to MySQL server security issues?

No. The SingleStore DB and MySQL servers are separate database engines which do not share any code, so security issues in the MySQL server are not applicable to SingleStore DB.

SingleStore DB implements the MySQL wire protocol in order to provide compatibility with MySQL client software. A part of this compatibility involves reporting a MySQL version number as described above. As a result of this, numerous third-party security scanners that use the reported version number as a vulnerability marker report false positives, as they’re looking at a version number, not at the actual presence of a vulnerability. Our engineering team as well as a third-party security analysis have confirmed that SingleStore DB is not susceptible to these MySQL vulnerabilities.

Is SingleStore DB a row-based or column-based store?

SingleStore DB provides both in-memory row-based and on-disk column-based stores.

The in-memory row store:

  • Works best for mixed transactional and analytical workloads
  • Provides low latency and highly concurrent reads and writes of individual rows as well as sophisticated analytical SQL queries
  • Supports PRIMARY and UNIQUE keys
  • Supports geospatial indexes
  • Has longer recovery times (as the entire table needs to be loaded into memory)

The on-disk column store:

  • Works best for analytical workloads
  • Allows tables larger than the amount of available RAM in the cluster
  • Uses compression (which lowers disk usage and accelerates replication)
  • Provides fast and efficient scans of large datasets
  • Provides sorted columnstore indexes
  • Is optimized for batch UPDATE and DELETE queries
  • Requires more expensive query compilation (as compared to the row store)

How does SingleStore DB’s in-memory lock-free storage engine work?

SingleStore DB’s storage engine uses multi-version concurrency control with lock-free skip lists and lock-free hash tables which allow highly concurrent reads and writes at very high throughput. Reads in SingleStore DB are never blocked, but updates to the same row can conflict with logical locks.

What is the advantage of SingleStore DB over traditional databases like Oracle, SQL Server or MySQL with data in a “ramdisk” or large buffer pools?

Two common techniques for leveraging large amounts of memory in traditional databases are storing data files on a “ramdisk” or running a disk-based storage engine with a large buffer pool. For example, MySQL performs much better with InnoDB configured to use a large buffer pool.

While running an existing storage engine like InnoDB in memory can alleviate some of the bottlenecks involved with disk, SingleStore DB has four distinguishing memory-optimized features that enable it to perform significantly better than disk-based storage engines running in memory:

  • SingleStore DB is a distributed scale-out system. SingleStore DB scales to thousands of machines on commodity hardware.
  • No buffer pool. Traditional databases manage a global buffer pool since they assume that the dataset can’t fit into memory. The buffer pool is a resource shared across all databases and all tables, which creates significant contention.
  • Lock-free data structures. SingleStore DB uses memory-optimized, lock-free skip lists and hash tables as its indexes. Unlike B+ Trees, these data structures are designed from the ground up to be fast in memory.
  • Code generation. Lock-free data structures are so fast that dynamic SQL interpretation quickly becomes the limiting factor for query execution. With code generation, SingleStore DB compiles SQL down to native code for maximum performance.

What is the advantage of SingleStore DB over other distributed databases?

  • Full SQL. SingleStore DB supports full SQL and transactional semantics.
  • Storage pyramid. SingleStore DB combines row and column store engines tuned for memory and flash storage.
  • Scales on commodity hardware. SingleStore DB doesn’t require exotic hardware and can run on premises or in the cloud.
  • Enterprise ready. SingleStore DB supports a large number of enterprise security and manageability features.

What is SingleStore DB not for?

SingleStore DB excels at real-time and high throughput query use cases. It is a great general purpose database for running both transactional and analytic workloads. However, there are use cases which SingleStore DB is not designed to run. Some of these are listed below:

  • Object store. SingleStore DB is not designed to be a blob store or “data lake”. It is designed for high value data that is structured or semi-structured and ready to query. SingleStore DB has open-source connectors for integrating with a variety of great object stores, including Amazon S3 and Hadoop File System (HDFS). See Data Loading for more information.

  • Running on low hardware. SingleStore DB is not designed to run on “micro instances”, mobile phones or other low-powered computers. It is designed to run on machines with at least 4 cores and 8GB of RAM. The easiest way to run SingleStore DB for development is to use the SingleStore DB Cluster-in-a-Box Docker deployment option; see Single-host.

  • In-process database. SingleStore DB is not run as a library or in-process with an application. SingleStore DB is a distributed database which runs in separate processes from the application, and applications connect to SingleStore DB via a client driver.

  • Serializable transactions. SingleStore DB supports extremely fast, distributed “READ-COMMITTED” transactions, but it is not suitable for applications which require “SERIALIZABLE” transactions.

Query Compilation

Why does CREATE TABLE take so long in SingleStore DB?

In order to speed up the compilation of queries in SingleStore DB, CREATE TABLE will precompile code used to access and update table indexes. This is a one-time operation for each unique table schema, and compiled table code will be cached on disk for future uses.

Why do SingleStore DB queries typically run faster the second time they are executed?

Traditional relational database management systems interpret SQL queries the same way interpreters for languages like Python and Ruby run programs. The first time a SingleStore DB server encounters a given query shape, it will optimize and compile the query for future invocations. This incurs overhead which does not depend on the amount of data to be processed, but rather the complexity of the query. The process of code generation involves extracting parameters from the query then transforming the normalized query into a SingleStore DB-specific intermediate representation tailored to the system. Subsequent requests with the same shape can reuse this plan to complete both quickly and consistently. Starting with MemSQL 5, SingleStore DB embeds an industrial compiler (LLVM) for code generation, leading to fast query performance for even the first time queries are run.

How much can you change a query before it needs to be recompiled?

If you only change an integer or string constant in a query, it will not require recompilation.

SingleStore DB strips out numeric and string parameters from a query and attaches the resulting string to a compiled plan. This string is referred to as a parameterized query. For example, SELECT * FROM foo WHERE id=22 AND name='bar' is parameterized to SELECT * FROM foo WHERE id=@ AND name=^.

You can list the distinct parameterized queries corresponding to all executed queries by running SHOW PLANCACHE.

The one exception to this rule is constants in the projection clause without an alias. These constants are compiled directly into the plan’s assembly code for performance reasons. For example, SELECT id + 1, id + 2 AS z FROM foo is converted to SELECT id + 1, id + @ AS z FROM foo.

Durability

What is the durability guaranteed by SingleStore DB?

MemSQL provides several options to control the tradeoffs between performance and durability (see Replication and Durability). In its most durable state, MemSQL will not lose any transactions that have been acknowledged.

Can I configure SingleStore DB to be fully durable?

Yes, you can configure the durability through the WITH {SYNC | ASYNC} DURABILITY option in the CREATE DATABASE and RESTORE DATABASE commands.

Does being in-memory mean that SingleStore DB will lose all data upon system failure or restart?

No. Unlike traditional relational database management systems, SingleStore DB uses RAM as the primary storage for data. However, SingleStore DB continuously backs up data to disk with transaction logs and periodic snapshots. These features can be tuned all the way from synchronous durability (every write transaction is recorded on disk before the transaction completes) to purely in-memory durability (maximum sustained throughput on writes).

On restart, SingleStore DB uses the snapshot and log files to recover its state to what it was before shutting down. Because the recovery process is parallelized across CPUs, the bottleneck in this process is the sequential hard drive speed.

See Using Durability and Recovery for more information.

If SingleStore DB writes data to disk, how can it be faster than disk-based databases?

Traditional relational database management systems use disk as the primary storage for data and memory as a cache. Managing this caching layer adds bookkeeping overhead and contention thus reducing throughput and concurrency. These constraints result in random read and write I/O, which puts significant pressure on both rotational and solid state disks.

On the other hand, SingleStore DB stores data primarily in memory and backs it up to disk in a compact format. As a result, SingleStore DB uses only sequential I/O and the transaction log size is significantly smaller. This I/O pattern is optimized for both rotational and solid state disks. Furthermore, reads in SingleStore DB can use memory-optimized lock-free skip lists and hash tables that cannot be managed in a buffer pool.

What isolation levels does SingleStore DB provide?

SingleStore DB provides the “READ COMMITTED” isolation level. This guarantees that no transaction will read any uncommitted data from another transaction. Furthermore, once a change is observed in one transaction, it will be visible to all future transactions.

Unlike the “REPEATABLE READ” or “SNAPSHOT” isolation level, “READ COMMITTED” isolation level does not guarantee that a row will remain the same for every read query in a given transaction. Applications that use SingleStore DB should take this into account.

Even though regular transactions use “READ COMMITTED” isolation level, backups created using the BACKUP command use “SNAPSHOT” isolation level.

Deployment and Management

How can I get a copy of SingleStore DB?

Sign into our Customer Portal and use the free license to test out SingleStore DB. We offer:

  • A 30-Day Free Enterprise Trial (trial key available through the portal).
  • A license that allows free use of SingleStore DB up to four license units.

Does SingleStore DB run in the cloud?

Yes. You can run a self-managed cluster in the cloud, or use our SingleStore Managed Service.

Does SingleStore DB run on Windows?

Yes, the SingleStore DB Quick Start “cluster-in-a-box” packaged via Docker can run on Windows. A SingleStore DB “cluster-in-a-box” is a master aggregator and leaf node collocated on a single host. This configuration is intended to provide familiarity with SingleStore DB but is not recommended for production.

SingleStore DB production environments run on modern 64-bit Linux versions, as described in the System Requirements section.

You can also run SingleStore DB on Microsoft Azure, using Azure Linux instances.

See System Requirements.

On which Linux distribution does SingleStore DB run best?

SingleStore DB is developed and tested most extensively on CentOS 6.x, 7.x, and Debian 8 and 9. See System Requirements for the full list of Linux distributions that are officially supported.

How much disk space should I allocate for SingleStore DB?

SingleStore DB uses disk for three types of storage:

  • Snapshot and log files that backup row store data. You should allocate about as much space on disk for this purpose as memory on your machine.
  • Compressed columnstore data files that contain column store data in SingleStore DB.
  • Object files that are the result of code generation. This includes for Data Definition Language (DDL) queries like CREATE TABLE and ALTER TABLE and for Data Manipulation Language (DML) queries like INSERT, UPDATE, DELETE and SELECT. On average, these usually require about 0.1 MB per unique plan.

Therefore, you should allocate roughly the amount of memory on your machine + space for compressed column store data + 0.1 MB for each plan. Note that the exact disk requirements will vary with the application, so it is advisable (and usually cheap) to allocate some extra disk space.

What happens if I run out of memory?

If the amount of memory used by row store tables (Alloc_table_memory from SHOW STATUS EXTENDED) is greater than the maximum_table_memory global variable (from SHOW GLOBAL VARIABLES), SingleStore DB will refuse to start new write queries (INSERT, UPDATE and LOAD DATA). Note that DELETE queries are not affected by this limit.

If a currently running query runs out of memory, it will rollback and notify the client of the error. See Memory Management for more information.

What happens if I run out of disk space?

If the amount of available disk space (in the <SINGLESTORE-DB-HOME>/data directory) is less than the minimal_disk_space global variable (from SHOW GLOBAL VARIABLES), SingleStore DB will refuse to start new write queries (INSERT, UPDATE and LOAD DATA). Note that DELETE queries are not affected by this limit, and the database will remain online for reads.

If a currently running write query exhausts the available disk space before making its changes durable, it will wait until more disk space becomes available before continuing. Queries may appear to “hang” when this happens. To determine how many queries and background threads are waiting for disk space run SHOW STATUS EXTENDED LIKE 'Threads_waiting_for_disk_space'.

SingleStore Tools

What are SingleStore Tools?

SingleStore Tools is a collection of utilities that help customers easily deploy, monitor, and manage clusters. Our team has been hard at work, continually adding new features and functionality to our tools to make deploying and managing a cluster easier than ever.

SingleStore Tools replaces our legacy management tool, MemSQL Ops. The singlestoredb-toolbox package in SingleStore Tools replaces the functionality provided by MemSQL Ops, and the singlestoredb-studio package replaces the functionality provided by the MemSQL Ops UI.

How do I switch over from MemSQL Ops to SingleStore Tools?

Follow this guide to perform an online migration of your existing cluster.

Can I still use MemSQL Ops?

MemSQL Ops is currently deprecated and the formal end-of-support date is May 13, 2021. Additionally, you may not upgrade to a SingleStore DB version 7.5 (future release), without a cluster that is managed via SingleStore Tools.

When do I have to switch to Tools?

If you want to upgrade to SingleStore DB version 7.5 (future release), you must switch to managing your cluster via SingleStore Tools. Additionally, we recommend that you switch over before the end-of-support date of May 13, 2021.

Clustering

How does SingleStore DB shard tables?

Every distributed table (except reference tables, which are replicated in whole on each “leaf” node) has a SHARD KEY that specifies which columns of a row to hash to determine what partition a row should reside in. When rows are inserted into a sharded table, they are hashed by the table’s shard key and sent to the leaf carrying the corresponding partition. This technique is commonly referred to as hash-based partitioning. You can choose how to shard each table by specifying its SHARD KEY as part of the CREATE TABLE statement. See Shard Keys for more details.

What are aggregator and leaf nodes?

SingleStore DB stores and computes data on leaf nodes. You can linearly scale both storage and computational power by adding more leaf nodes. Clients query an aggregator node, which in turn queries one or more leaf nodes to collect the rows required to execute the query. Multiple aggregators nodes perform the same functions with respect to executing Data Manipulation Language (DML) queries and allow clients to load-balance queries across the aggregators. Leaf nodes should not be queried directly except for maintenance purposes in exceptional situations.

What is a “master aggregator”?

The Master Aggregator is an aggregator responsible for executing DDL and clustering operations (e.g. ADD LEAF ... or CREATE TABLE...).

What happens if the master aggregator crashes?

If the Master Aggregator becomes unresponsive, clients can continue to execute DML queries (e.g. INSERT and SELECT) against the other aggregators, but DDL and clustering operations can not be performed until the master aggregator is revived or another aggregator is “promoted” to be the master aggregator.

How do I add nodes to SingleStore DB?

Using the new SingleStore management tools, you can add new nodes and assign them roles in your cluster. To see how to do that in a deployment, visit our SingleStore Tools reference.

From within MemSQL Ops, select the “Add Host” button in the top right to register a physical or virtual machine with MemSQL Ops. Then, provision it as an aggregator or leaf by selecting the adjacent button “Add Node”. To increase the capacity of the cluster, provision new leaf nodes. To increase the data loading bandwidth or load balancing bandwidth to the cluster, provision new aggregator nodes.

How many aggregator and leaf nodes do I need?

SingleStore DB stores data in leaf nodes, so you need enough leaf nodes to store all your data in memory. If you are replicating data (redundancy level 2), you need twice as many leaf nodes.

The recommended number of aggregators depends on your use case. If, for instance, your cluster is being used for more than one type of workload (for example, it is the backend for a web application and also being queried by analysts), it is probably best to have multiple aggregators, or pools of aggregators, for these separate workloads. Aside from distribution of workload, the most significant factor to consider is network bandwidth. As a rule of thumb, clusters with 50 nodes or fewer should have about a 5:1 leaf to aggregator ratio. Clusters with more than 50 nodes can have closer to a 10:1 leaf to aggregator ratio. Note that you can also add nodes to a cluster to tune performance after it is up and running.

The appropriate ratio of aggregators to leaves also depends on the type of workload running. Transactional workloads that run many small queries or queries that involve only a single partition require more aggregators, since those queries interact with one aggregator and one leaf. Analytical workloads,especially those involving distributed joins, require fewer aggregators because almost all the work is performed on the leaves.

Can I query the leaf nodes individually?

Yes, but this is not recommended. It only should be done in troubleshooting scenarios.

After I ran a single host install, why are there 2 SingleStore DB nodes (or 4 SingleStore DB processes) running?

This is the “single host cluster” setup. Your machine is running both a master aggregator process and a leaf process, and hence “2 SingleStore DB Nodes” is shown (each SingleStore DB instance or node contains two processes). Make sure to send your queries to the aggregator.

Query Execution

Can I JOIN multiple sharded tables in a query?

Yes. SingleStore DB supports advanced join capabilities and will automatically redistribute data as necessary to complete a query. SingleStore DB can also take advantage of collocated data across shard keys and reference tables to reduce data movement. See Distributed Joins.

Can I optimize a distributed join involving a small, static table?

Yes, a small table which does not change frequently can be made into a reference table, which is replicated to all the leaf nodes. This ensures that the table does not need to be moved when joined against, at the cost of using more memory. See Distributed SQL.

Why do I get errors about UNIQUE KEYs?

SingleStore DB does not support unique keys unless the rows in the unique key are a superset of the rows in the shard key. For more information about the shard key, see Distributed SQL.

Import and Backup

How can I backup a SingleStore DB database?

SingleStore DB supports consistent, online, cluster-wide BACKUP and RESTORE operations that do not require blocking write operations on the database like mysqldump does. See Backing up and Restoring Data section for more information.

How can I import data from MySQL, Postgres, MS-SQL etc?

See How To Load Data Into SingleStore DB.

How can I easily copy a table?

You can use CREATE TABLE dest AS SELECT * FROM source. See CREATE TABLE.

Or, create a new empty table using the schema of the original table from SHOW CREATE TABLE source and copy data from source table into the new table using INSERT INTO dest SELECT * FROM source.

How can I easily copy a database?

There are two options:

  • Use replication. Run REPLICATE DATABASE dest_db FROM user@host:port/src_db, and after it fully synchronizes run STOP REPLICATING dest_db. See Replication .
  • BACKUP the database and RESTORE it on the same (or a different) cluster under a different name.

Where are the important data files (recovery log, binary logs, snapshots, data files etc)?

Run SHOW STATUS EXTENDED LIKE '%_directory' to get the full paths.

SingleStore DB and Spark

SingleStore DB and Apache Spark are both distributed, in-memory technologies. SingleStore DB is a SQL database, while Spark is a general computation framework. SingleStore DB has tight integration with Apache Spark through its SingleStore Spark Connector offering. For instance, with SingleStore DB and Spark clusters deployed, users can extract data from real-time sources such as Kafka, run the data through a Spark machine learning library model, and store the model result into SingleStore DB to be persisted and queryable.

What are the differences between SingleStore DB and Spark SQL?

  • Spark SQL treats datasets (RDDs) as immutable - there is currently no concept of an INSERT, UPDATE, or DELETE. You could express these concepts as a transformation, but this operation returns a new RDD rather than updating the dataset in place. In contrast, SingleStore DB is an operational database with full transactional semantics.
  • SingleStore DB supports updatable relational database indexes. The closest analogue in Spark is IndexRDD, which is currently under development, and provides updatable key/value indexes.

You can connect SingleStore DB to Spark with the SingleStore Spark Connector. The SingleStore Spark Connector 3.0 is the latest GA version.

SQL push down

What happens if SQL push down fails?

The SingleStore DB Connector takes a best effort approach towards query push down. While Spark is preparing the query for execution, the SingleStore DB push down strategy attempts to push down every subtree starting with the entire query. If anything fails, we simply leave the tree as is and Spark handles executing the unsupported section of the tree.

How can I check to see if a query is pushed down?

Every DataFrame has a method called .explain which will print the final plan before execution. If the first element in that plan is a MemSQLPhysicalRDD then the DataFrame has been fully pushed down.

What SQL push downs are not supported?

We are constantly improving push down, so the best thing to do is just try your query and then use .explain to check to see if it got pushed down. If you find a query which is not pushed down, please raise an Github issue on the Connector repo.

Miscellaneous

Does SingleStore DB support compression?

Yes, column store tables are compressed by default.

Does SingleStore DB perform random IO?

No, only sequential IO is used by SingleStore DB.

What are the index types SingleStore DB supports?

The in-memory row store supports skip lists, hashtables and geospatial indexes. The on-disk column store supports ordered columnstore indexes.