Outdated Version

You are viewing an older version of this section. View current production version.

6.7 Release Notes

The MemSQL 6.7 release is focused on usability, compatibility and performance. With respect to usability, we’ve made major enhancements in our toolset for deploying and managing MemSQL, understanding how tables are being used to help you tune your table structures, automatic sampling for improved query optimization, resource management, and more. Compatibility functions have been added for date handling and regular expression matching to make it easier to port queries from legacy databases. And performance of data loading and star join query processing have been improved dramatically.

Beginning with this release, MemSQL may be used free of charge in production for clusters with up to four license units per cluster.

See the descriptions and changelog below for more information on these new features.

Query Execution

The following query execution features have been added:

  • Star join query execution performance has improved through addition of support for joining on integer columns in columnstore tables, using Intel Single-Instruction Multiple-Data (SIMD) instructions and operations directly on encoded (compressed) data. Some join/group by/aggregate queries have improved over 100X.
  • Faster filtering on columnstores
  • Improved IN-list filter performance

Query Optimization, Compilation and Statistics

The following query optimizations have been added in this release:

  • Cardinality estimation (CE) improvements for IN-lists using histograms
  • Fast histogram creation on row store tables with random sampled scan
  • Fast, accurate, uniform random sampling for complex predicates CE on row store tables using random sampled scan
  • interpret_first setting to improve ad hoc query (first run) query performance (experimental)
  • OPTION clause for global query options, applicable to SELECT, INSERT…SELECT, UPDATE, DELETE
  • Added warnings in EXPLAIN and information_schema for queries with comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance.
  • New management views (MV_QUERY_COLUMN_USAGE, MV_AGGREGATED_COLUMN_USAGE) to track how columns are used in queries to help with index tuning and application design
  • Improvements to query compilation speed

Functional Extensions

The following new functional capabilities are supported:

  • Dynamic SQL
  • Regular expression support for advanced regular expressions (ARE) and common Perl regular expression extensions. Added functions regexp_instr() and regexp_replace().
  • CUBE, ROLLUP, and GROUPING()
  • nvl(), to_char(), to_timestamp(), decode(), and strcmp() functions
  • APPROX_COUNT_DISTINCT_* functions based on HyperLogLog algorithm to allow persisting and rolling up approximate count distinct state for fast count distinct estimates
  • UNION/UNION ALL on the right side of IN
  • SELECT FOR UPDATE to allow lock stability between statements for online transaction processing workloads
  • MySQL binary protocol and prepared statements
  • Fast random sampled scans on row store tables using table_name WITH(sample_ratio = <value>)

Tools

  • Next generation of command-line tools that lay a foundation for flexible, efficient deployment and management of MemSQL clusters using industry-standard tools like Puppet, Chef, Ansible, and Salt
  • memsql-deploy tool for those who wish to deploy with the MemSQL-provided toolset
  • Fully-supported APT and YUM repositories for secure distribution of MemSQL packages. New toolchain is distributed and installed via the RPM and DEB file format.
  • memsql-toolbox, a stateless tool for administering MemSQL clusters, with performance and reliability improvements over the previous-generation memsql-ops tool
  • New customer portal providing a single point of access for license management, downloading MemSQL software, discussion forums, and support
  • MemSQL Studio graphical management console upgrades including 6.7 support and graphical EXPLAIN and PROFILE plan display

System Management

  • Resource governor extensions to control CPU usage and concurrency
  • Machine-learning-based workload management extensions to estimate resources used by queries the first time they are run and queue large queries appropriately to maintain throughput and response time under load
  • Management view to show columnstore merge status
  • Extensions to information_schema views to track status of statistics (last update and automatic statistics status)
  • Additional cluster events tracked in mv_events view
  • New MV_CLUSTER_STATUS_VIEW to give cluster status in query-able form
  • Added memory-related information to MV_NODES view

Data Loading, Backup and Restore

  • Adaptive data compression for LOAD DATA (up to 2X performance gain)
  • Tar backup to Amazon S3
  • Backup/restore to/from Azure Blob Store
  • Enhanced metadata for backups in MV_BACKUP_HISTORY view as well as backup files
  • Enhanced validation prior to restore or disaster recovery replicate operations, to ensure there is enough space on the target system
  • SELECT INTO OUTFILE on S3
  • JSON and Avro support for LOAD DATA and pipelines

Maintenance Release Changelog

2020-04-13 Version 6.7.29

  • Avoid inaccurate sampling estimates for expressions that use CURRENT_SECURITY_ROLES().
  • Fixed a locking issue that could cause dropping a temporary table to fail.

2020-03-18 Version 6.7.28

  • Fixed a crash that occurred during query rewrites for complex UPDATE and SELECT queries having subselects.
  • Queries inside of stored procedures that use variables run as single partition queries if the queries only need to access data within single partitions. Previously, these types of queries did not run as single partition queries.
  • Fixed a deadlock that is possible when the RESTORE DATABASE command fails and attempts to remove the restored database partitions that were created before the command failed.
  • No longer allow multiple leaf nodes to be added using host names or IPs that resolve to the same machine.
  • Fixed an issue where specific queries would not properly account for the number of available threads.
  • Fixed an error that occurred when a plan was dropped from the plancache during a distributed join.

2020-02-24 Version 6.7.27

  • Fixed an error that occurred when SELECT ... GROUP BY ... ORDER BY RAND() was run on a columnstore table.
  • Fixed an issue where cross-database distributed joins could fail to cleanup the temporary result tables created to run the join.
  • Fixed a hang that occurred during an ALTER TABLE if more than max_connection_threads queries are blocked from running on a child aggregator during the phase of ALTER TABLE when queries are blocked.
  • Fixed a memory leak in queries that use full-text indexes.
  • Fixed an issue with a specific INSERT ... SELECT ... UPDATE query that caused a crash instead of returning an error.
  • Fixed an issue that may cause pipeline extractor subprocesses to hang.
  • Fixed hung queries that resulted from invalid UTF8 characters.
  • Fixed an issue that could occur while replicating user/grant metadata within a cluster. The issue could cause incorrect permission checks if the sync_permissions engine variable is enabled.

2019-11-25 Version 6.7.26

  • Added the performance_schema read-only engine variable. This allows newer MySQL Connector/J and JDBC drivers to connect to MemSQL.
  • Now, ALTER TABLE and TRUNCATE TABLE only wait for garbage collection to complete if a previous operation triggered garbage collection on the same table. Previously, ALTER TABLE and TRUNCATE TABLE waited for garbage collection to complete on any table.
  • Now, improve performance of garbage collection on the plancache, when the plancache is large. Garbage collection on the plancache is triggered when the plan_expiration_minutes engine variable expires.
  • Fixed an issue causing the workload manager to share the same queue for memory intensive, connection intensive, and thread intensive queries. Now, the workload manager uses a queue for memory intensive queries and another queue for connection and thread intensive queries.
  • Fixed an issue that caused resource pools to gather incorrect memory use statistics for some queries.
  • Fixed a crash that occurred when a stored procedure used by a pipeline writes to a different database than where the pipeline is located.
  • Now, you can run CREATE TEMPORARY TABLE concurrently, on multiple connections, against the same table. The same applies to DROP TEMPORARY TABLE.
  • Added support for some SHOW CHARACTER SET WHERE ... clauses. This is needed for MySQL Workbench 8.0 support.

2019-10-21 Version 6.7.25

  • Fixed an issue where file descriptors were sometimes left open when processes external to the engine (such as pipeline processes) encounter errors.
  • Fixed an issue where no error was returned when large negative row-ids were present in an internal table. These row-ids are used when loading data using pipelines.

2019-09-30 Version 6.7.24

  • Fixed an issue where running SHOW PLANCACHE or querying information_schema.PLANCACHE could cause a server crash if the engine runs out of memory while performing these operations.
  • Fixed an issue where running KILL QUERY on an ALTER VIEW query could cause new queries accessing the view to go into an infinite loop.
  • Fixed the permission check on CREATE TABLE ... SELECT queries to correctly check for the SELECT permission type on the tables being selected. Prior to the fix, the tables being selected were checked for the permission type required by the previous compiled query.
  • Fixed an issue where a user could run GRANT <privileges> ON <resources> TO <role> if the user didn’t have the privileges being granted. Now, this GRANT query fails if the user doesn’t have these privileges.
  • Fixed an issue where cluster-wide permissions could be granted, at the database level, to roles. These permissions can no longer be granted at the database level.
  • Fixed an issue where TRUNCATE TABLE caused unrecoverable replica partitions.

2019-08-14 Version 6.7.23

  • Now, improve the performance of restoring a database that contains many small columnstore BLOB files.
  • Fixed a crash that occurred when LOAD DATA encountered an error on a line of input whose text is larger than 100 MB.
  • Fixed an issue where a global sync variable that was set on the master aggregator would not sync to other nodes in the cluster.
  • Changed how successive row-id values are generated in an internal system table. Now, these values are more likely to be the same. These values are used when loading data using pipelines.
  • Now, improve the performance of LOAD DATA when it loads a file from a directory that contains many files.
  • Fixed a bug where memory usage being tracked by resource pools is negative. Also, corrected some tracelog messages regarding resource pool memory tracking.
  • Fixed an issue that caused user defined function (UDF) security checks to be done on leaf nodes instead of on the aggregator. This would cause a permission denied error because users created by a client application don’t exist on leaf nodes.
  • Now, if there are too many threads waiting for child threads to be scheduled, switch to non-parallel execution if possible. Fail the query otherwise. Previously, there was a thirty second timeout for launching child threads. This is too long of a wait for some workloads.
  • Fixed an issue where the constant folding optimization was disabled in edge cases. The fix speeds up queries that were affected by the issue.
  • Now, address a deadlock that is possible during REBALANCE PARTITIONS if the child aggregator runs out of connection threads.
  • Fixed a syncing issue seen when files were written to the plancache during code generation. This issue caused code generation failures and unrecoverable databases.

2019-07-15 Version 6.7.22

  • Removed some unneeded memory allocations that were being done before running parallel queries. This addresses the high CPU usage that sometimes occurred when running these queries.
  • Now, clear the DNS cache on secondary cluster nodes when nodes are modified on the primary cluster.
  • Fixed an issue where ALTER PIPELINE would go into an infinite loop.
  • Fixed a rarely occurring issue where resource pools were incorrectly tracking memory usage.
  • Fixed an issue where the use of AVX2 and SSE2 by LOAD DATA was disabled.

2019-07-01 Version 6.7.21

  • Fixed a stack overflow caused by queries that use many parameters.
  • Fixed an issue that was causing high system CPU usage. This issue occurred when running context switch heavy workloads. These workloads contain many fast executing queries that run against multiple connections.
  • Now, a leaf node fails over to its pair if, for sixty seconds, the leaf has all of its threads running queries (as specified by max_connection_threads) and is not maintaining a minimum throughput of five queries per second.
  • Now, a leaf node fails over to its pair if the leaf’s disk usage falls below the value of minimal_disk_space.
  • Fixed an issue where LAST_INSERT_ID(<expression>) was returning an incorrect value.

2019-06-03 Version 6.7.20

  • Fixed an issue where a user authenticated with Pluggable Authentication Module (PAM) could not run ANALYZE TABLE.
  • Now, the value of the SIZE field in information_schema.MV_BACKUP_HISTORY is stored in megabytes instead of bytes.
  • Now, improve the performance of SELECT ... INTO S3.

2019-05-20 Version 6.7.19

  • Fixed an issue where the bootstrap aggregator would not work on machines where the only configured address is the loopback address.

2019-05-06 Version 6.7.18

  • Now, speed up optimization for a query with a large IN list by not traversing the list in unnecessary places. Also, added the engine variable inlist_precision_limit that specifies the maximum number of list values to consider during histogram estimation for such a query; a smaller limit will result in a faster compilation time but may also give a less accurate estimate.
  • Now, reduce memory used to optimize queries that have many joins.

2019-04-29 Version 6.7.17

  • Now, add support for SSL certificates with passphrases. The passphrase for the certificate can be set via ssl_key_passphrase.
  • Now, when maximum_memory is lowered below maximum_table_memory, update maximum_table_memory to ninety percent of maximum_memory.
  • Now, improve the performance of changing permissions when sync_permissions is enabled and many users are created.
  • Fixed an issue where CREATE TABLE IF NOT EXISTS <tbl> ... AS SELECT would drop <tbl> if it already existed.
  • Fixed a memory leak that occured when sync_permissions was enabled and users’ permissions were changed.

2019-04-08 Version 6.7.16

  • Fixed a bug that caused database backup to Azure and database restore from Azure to fail.
  • Now, stored procedures, user-defined functions (including scalar-valued, table-valued, and aggregate) accept a maximum of 256 parameters. Built-in functions that accept a variable list of parameters can now accept a maximum of 256 parameters. Examples of such built-in functions are CONCAT and COALESCE.
  • Fixed an issue that prevented recovering replica partitions from being dropped when the replicas were trying to download data from nonexistent master partitions.
  • Now, memsqlctl query outputs Empty result set when it returns no rows.

2019-03-25 Version 6.7.15

  • Fixed a crash that is possible if the hash table built for a hash join contained more than 2 billion rows.
  • Fixed a deadlock that is possible if enough parallel queries are executed at the same time on a leaf.
  • Fixed a crash where FULL OUTER JOIN and USING clause together.
  • CLEAR ORPHAN DATABASES now clears out extra replica partitions that may be present due to past failed REBALANCE PARTITIONS operations.
  • Grouping by window functions will now throw an error instead of crash.
  • Fixed an issue with recovering columnstore data on restart if a replica partition was in the processing of being converted to a master when the node was shutdown.
  • memsqlctl delete-node now recognizes the --stop flag to stop the nodes before deletion.

2019-03-04 Version 6.7.14

  • Fixed a bug that caused memsqlctl describe-node to error when run on a stopped node.

2019-03-04 Version 6.7.13

  • Now, generate an error if GROUPING is used at an invalid location in a query.
  • Now, if REBALANCE PARTITIONS fails to synchronize a replicated partition because its replica is not reachable, it continues to synchronize the other replicated partitions.
  • Fixed a crash in an encoded query join using a many-to-many relationship.
  • Fixed a crash that can occur when BACKUP runs out of disk space while writing data using the Network File System (NFS).
  • Now, retry the first statement in a multi-statement transaction if the first statement fails due to a failover.
  • Now, at least one GB of the memory available to MemSQL must not be used for inserting new table data during query processing. This table memory may be used to allow MemSQL to load row store table data from disk to memory or by other MemSQL operations.
  • Now, display more fields in the output of memsqlctl describe-node.

2019-02-19 Version 6.7.12

  • Fixed LOAD DATA compilation failures on virtual machines which support AVX2 but omit it from CPUID.
  • Now allow SET autocommit and SET GLOBAL autocommit to be run against leaf nodes.
  • Now, on restart of the master aggregator, bring the aggregator’s database online even if a secondary database is unable to connect to the aggregator.
  • Fixed an issue where replication connections were allowed to request BLOB files without proper authentication.
  • Fixed a crash in a query using FULL OUTER JOIN and EXCEPT.
  • Now, on upgrade, change the CPU usage limits for all resource pools to one-hundred percent if the CPU usage limits have not been set previously.
  • Fixed an issue that was causing REBALANCE to intermittently fail with a “missing BLOB file due to replication lag” error where the BLOB file was not actually missing.

2019-02-04 Version 6.7.11

  • Fixed an issue where a slow-running DROP DATABASE query using reference tables was blocking the execution of a concurrent ALTER TABLE (or other DDL) query.
  • Added the global variable internal_set_user_timeout_on_connections, which can be set to true or false. When set to false, this disables the TCP_USER_TIMEOUT socket option, while keeping TCP keepalives enabled (assuming internal_set_keepalive_on_connections is set to true.)
  • Now, generate a clearer error message when a TCP connection drops after the TCP_USER_TIMEOUT threshold has been reached. (This assumes internal_set_user_timeout_on_connections is set to true). Now, also generate a clearer message when a TCP connection drops when keep-alive probes fail. (This assumes internal_set_keepalive_on_connections is set to true.)
  • Now internal_set_keepalive_on_connections can be disabled without having to restart the cluster.
  • Added support for more query shapes that previously generated errors because of unsupported complex joins.
  • Now, retry synchronizing replica partitions when running copy commands such as REBALANCE that have timed-out when they try to run PROMOTE after the copy.
  • Fixed an issue that was causing some clients to fail to execute prepared statements against MemSQL. To address this issue, MemSQL now more closely matches MySQL’s response to the COM_STMT_PREPARE protocol command.
  • Now prevent the LLVM Compiler Infrastructure from using AVX512 instructions which were causing code generation failures when MemSQL compiled queries.
  • Now remove some output from the SHOW CLUSTER STATUS command to make the output easier to read.
  • Now provide a more detailed error message when a cluster is missing a master partition.
  • Fixed a buffer overrun when MemSQL is using the binary protocol to send data back to the users application for string columns of a particular size.
  • The storage_type column has been added to information_schema.table_statistics.
  • Now the SCALAR function throws an error if its input query returns zero rows.
  • Fixed an issue that caused some MySQL client drivers to be incompatible with MemSQL’s SHOW CREATE PROCEDURE command. Now, the column headers for SHOW CREATE PROCEDURE match MySQL and show Procedure instead of Function.

2019-01-22 Version 6.7.10

  • Increased the maximum number of terms that can be searched in a full text query (a single prefix query like “a*” would cause multiple terms being searched).
  • Now, the out-of-order processing optimization for Kafka batches can be explicitly enabled or disabled. This optimization should be disabled if records in a batch must be processed in order. Note Records will be committed in order regardless of processing order. See CREATE PIPELINE or ALTER PIPELINE for more details.
  • Increased maximum transform argument size from 4k to 16k.

2019-01-14 Version 6.7.9

  • Now RESTORE of partition databases on leaves always denotes the databases as a partition in metadata. Previously, it was possible to trick the RESTORE command into restoring the partition as a reference database (such as when running BACKUP on a leaf node).

2019-01-07 Version 6.7.8

  • Now, SHOW CLUSTER STATUS uses connection pooling to communicate with other nodes in the cluster. This allows the command to run more safely when it is executed repeatedly.
  • Now provide more troubleshooting help when MemSQL runs out of memory when attempting to run a query.
  • Fixed an issue where BLOBs were not being restored during recovery of a secondary database.
  • Fixed an issue where the state column in the information_schema.PROCESSLIST table was not being cleared properly after a query ended.
  • Now provide more detailed information in the tracelog about heartbeat failures when a node fails.
  • Added a system variable auditlog_disk_sync that delays when the audit log gets written to the disk. This delay improves the performance of audit logging. This system variable delays disk syncs by default (i.e. the default value is false).
  • Fixed an issue where zero-byte files were being created in the plancache directory during a hard shutdown such as a power outage.
  • Fixed a master aggregator crash that occurred when running REBALANCE ... FORCE on a database with a table that has a shard key on a computed column.
  • The memsqlctl create-node command now requires a password to be specified.
  • Now, the memsqlctl start-node and memsqlctl restart-node commands do not end the memsqld_safe process if the commands cannot connect to the node.

2018-12-21 Version 6.7.7

  • Fixed an issue when sync_permissions is enabled on DR secondary clusters where users could not log in.
  • No longer truncate the results of the CURRENT_SECURITY_ROLES function to 64 characters if used inside a sub-query.
  • Now generate an error when attempting to create an array having a negative size.
  • Fixed an issue with the PERCENTILE_DISC function rounding to the wrong percentile.
  • Fixed an issue where the master aggregator incorrectly populated the database name in the FILE field of the information_schema.COLUMNAR_SEGMENTS table.

2018-12-18 Version 6.7.6

  • Fixed optimizer error when using binary builtins.
  • The information_schema.users table now includes local users when sync_permissions is enabled.
  • Added systemd service file to memsql-server packages. This allows memsqlctl-managed nodes to start automatically when the cluster starts.

2018-12-11 Version 6.7.5

  • Sped up LOAD DATA CSV parsing for files with short lines.
  • Fixed a crash when parsing LOAD DATA statements that used some of the optional clauses in particular orders.
  • Fixed a crash when revoking privileges from the root user when sync_permissions is enabled.
  • Fixed a MySQL wire protocol incompatibility with the FIELD_LIST protocol command. Newer MariaDB clients were getting “lost connection” errors as a result of this incompatibility.
  • Reference tables now reserve AUTO_INCREMENT values in batches of 1000 instead of 1 million.
  • Memsqld_safe now exits when it receives SIGABRT instead of passing it through to memsqld.
  • Removed misleading “REQUIRED_STRING” from the description of optional memsqlctl flags.
  • Added the --skip-disk-check flag to memsqlctl enable-high-availability to skip the safety check for free disk space.

2018-12-03 Version 6.7.4

  • Fixed a crash during code-generation when a large string is used inside an IN list.
  • Fixed a crash when querying information_schema.ADVANCED_HISTOGRAMS on a Disaster Recovery (DR) secondary cluster.
  • When MemSQL hits an out of memory error, a trace entry containing the query text of any query using more than 100 MB is now written to the tracelog to help diagnose what was using memory at the time of the error. Also included in each entry are the current query memory, average query memory, and activity name.
  • Removed --memsql-cnf flag from memsqlctl delete-node command.
  • Fixed incorrect result with correlated subselect expression using OR condition on a reference table.

2018-11-19 Version 6.7.3

  • Improved out-of-memory handling when retrieving autostats during query compilation.
  • Now allow SET SESSION autocommit = 0 commands against leaf nodes. Some MySQL clients will execute this statement when they connect, so disallowing it blocked users from connecting to leaves to monitor usage with clients that run this command. SET GLOBAL autocommit=0 remains blocked on leaf nodes. Disabling autocommit globally on leaves interferes with memsql’s ability to coordinate transactions.
  • Now block SELECT queries without FROM clauses from compiling when MemSQL memory use is nearing maximum_memory. Historically, SELECT queries with no FROM clauses were allowed to compile even if memory use was low as they played a part in cluster health checks that run against leaves. This is no longer the case, so there is no reason to risk compiling them when memory use is close to maximum_memory.
  • The CRC32 builtin function was not returning results consistent with the crc32c algorithm when compared to the result produced by external libraries.
  • Fixed tracking of missing blobs at the end of recovery. Previously, finishing recovery of a secondary database in the middle of a columnstore transaction could cause a missing blob to go untracked.
  • Added variable (enable_broadcast_left_join) to control whether the optimizer chooses the broadcast left join optimization.
  • Updated Brazilian time zones due to recent changes made by the country of Brazil. Specifically America/Campo_Grande, America/Cuiaba, and America/Sao_Paulo. Note: All Brazil/ time zones are deprecated.
  • Correctly block restoring an individual partition database backup on the master aggregator. Before this fix, the RESTORE command would fail but an empty reference database would be left behind that couldn’t be dropped. This fix also allows the left over empty reference database to be dropped if it had already been created before upgrading.
  • Now allow the memsqlctl configuration file path to be specified using the MEMSQLCTL_CONFIG environment variable.
  • Fixed bug in memsqlctl print-secure-key that caused it to incorrectly attempt writing the state file.

2018-11-06 Version 6.7.1

  • Initial GA release of MemSQL 6.7.