You are viewing an older version of this section. View current production version.
6.5 Release Notes
The MemSQL 6.5 release includes new features, functionality, and performance improvements such as support for HDFS pipelines, full text search, histogram improvements for cardinality estimation, extensibility improvements, and more.
See the descriptions and changelog below for more information on these new features.
Data Loading
HDFS Pipeline Support
A new HDFS pipeline allows you to perform ETL operations on files from Hadoop, Pig, or Spark jobs. See HDFS Pipelines Overview for more details.
Kerberos and SSL support for Kafka Pipelines
You can now configure Kafka pipelines to connect to your Kafka brokers through SSL and optionally authenticate with Kerberos. See Enabling SSL and Kerberos on Kafka Pipelines for more details.
Pipeline Support with Stored Procedures
In previous versions of MemSQL, you could only insert into one table for each pipeline that you created. Now in MemSQL 6.5 and later, you can insert into multiple tables from one pipeline by specifying a stored procedure in a CREATE PIPELINE
command.
Having pipeline support with stored procedures also means supporting new scenarios such as data transformation in SQL. See the INTO PROCEDURE section of CREATE PIPELINE
for more details.
Performance improvements for loading into S3 Pipelines
S3 pipelines now have the ability to process new files in a bucket at a much faster rate without incurring the penalty of listing all of the files in the bucket before performing the ETL operations of the pipeline. Previously, this could be a slow operation if you had a bucket with a large file count.
The only requirement to enable this new functionality is to prefix filenames in your bucket with an increasing alpha-numeric value, such as a timestamp or some other marker (e.g. YYYY-MM-DD-filename.extension
). You do not need to add any configuration elements to your CREATE PIPELINE
statement.
Backup and Restore from S3
You can now backup and restore your databases to a S3 bucket. See BACKUP and RESTORE for syntax and additional details.
General Improvements
- Performance improved for columnstore compression
- Support for using .tar archives for columnstore backups
- Ability to debug and test out transforms by running new EXTRACT PIPELINE INTO OUTFILE command
- New PROFILE PIPELINE command useful for debugging pipeline bottleneck issues
- Performance improvements for some of the pipelines and columnar information_schema tables
- Improved performance for
LOAD DATA
when loading CSVs that contain large numbers of columns - The new BACKUP_HISTORY table provides important metadata on recent successful backups.
- The new MV_EVENTS table provides a history of cluster-level events such as nodes attaching to a cluster or the rebalancing of partitions
- The new LOAD_DATA_STATUS table tells you how much data has been ingested during a
LOAD_DATA
operation
Query Language Additions
FULL OUTER JOIN now supported
See SELECT for more details on usage.
PIVOT
You can now transform non-aggregated event data into a pivot table output format using the PIVOT
clause in a SELECT
statement. See PIVOT for more details.
Full Text Search
Full text search allows searching for words or phrases in columnstore table columns with a large body of text. See Full Text Search for more details.
Query Execution
- Integer run-length encoding is now supported for encoded filters. See Data Encodings Supported for more details.
- Improved correlated subselect support by removing restrictions and allowing dependent fields in the left expression of an
IN
subselect - Supports inserting into table with computed column as shard key and
ignore_insert_into_computed_column = ON
. - New data conversion functionality that throws errors for integer under/overflow and string truncation issues. Controlled through the new system variable
data_conversion_compatibility_level
. See System Variables for more information. - Improved performance for queries that move large amounts of data between nodes
- Improved performance of
GROUP BY
with many groups
Query Optimization
- Improved performance for queries that use windowed aggregate functions combined with filters
- Improved histograms and cardinality estimation
- Improved selection of encoded
GROUP BY
plans - Improved cardinality estimation for
BETWEEN
filters
Extensibility
- Support for multiple result sets from a single stored procedure. See CREATE PROCEDURE for more details.
- MemSQL now supports certain DDL statements inside stored procedures. See CREATE PROCEDURE for more details.
Cluster Manageability
SSL for cross-cluster communication decoupled from SSL for intra-cluster communication
You can now configure your cluster to use SSL for communication between clusters during replication only, and have it off for local communication between nodes. This can be useful if the performance cost of securing intra-cluster communication is too high for your workload. See Server Configuration for Secure Client and Intra-Cluster Connections for more details.
Resource Governance
Memory limits can now be set to prevent unintended queries from consuming all available query execution memory in the cluster. See Setting Resource Limits for more details.
Workload Management
MemSQL now estimates the amount of memory required to execute queries and only runs those queries if sufficient memory is available. See Workload Management for more information.
Synchronize User Permissions Across the Cluster
You can now set and update cluster-wide user permissions on the master aggregator and have those changes synchronized to all nodes in your cluster by setting sync_permissions
to ON
in your master aggregator.
General Improvements
- Improved
AUTO_INCREMENT
behavior during cluster restarts
Maintenance Release Changelog
2019-07-30 Version 6.5.27
- Now, if there are too many threads waiting for child threads to be scheduled, switch to non-parallel execution if possible. Fail the query otherwise. Previously, there was a thirty second timeout for launching child threads. This is too long of a wait for some workloads.
- Now, address a deadlock that is possible during
REBALANCE PARTITIONS
if the child aggregator runs out of connection threads. - Fixed a syncing issue seen when files were written to the plancache during code generation. This issue caused code generation failures and unrecoverable databases.
2019-07-08 Version 6.5.26
- Fixed an issue that was causing high system CPU usage. This issue occurred when running context switch heavy workloads. These workloads contain many fast executing queries that run against multiple connections.
- Now, a leaf node fails over to its pair if, for sixty seconds, the leaf has all of its threads running queries (as specified by
max_connection_threads
) and is not maintaining a minimum throughput of five queries per second. - Now, a leaf node fails over to its pair if the leaf’s disk usage falls below the value of
minimal_disk_space
. - Fixed an issue where the bootstrap aggregator would not work on machines where the only configured address is the loopback address.
- Removed some unneeded memory allocations that were being done before running parallel queries. This addresses the high CPU usage that sometimes occurred when running these queries.
2019-03-25 Version 6.5.25
- Fixed a crash that is possible if the hash table built for a hash join contained more than 2 billion rows.
- Fixed a deadlock that is possible if enough parallel queries are executed at the same time on a leaf.
- Fixed a crash that can occur if the NFS target of a
BACKUP
runs out of disk. - Now transition reference databases online when the master aggregator restarts even if a secondary is having trouble connecting.
2019-02-19 Version 6.5.24
- Added the global variable
internal_set_user_timeout_on_connections
, which can be set totrue
orfalse
. When set tofalse
, this disables theTCP_USER_TIMEOUT
socket option, while keeping TCP keepalives enabled (assuminginternal_set_keepalive_on_connections
is set totrue
.) - Now, generate a clearer error message when a TCP connection drops after the
TCP_USER_TIMEOUT
threshold has been reached. (This assumesinternal_set_user_timeout_on_connections
is set totrue
). Now, also generate a clearer message when a TCP connection drops when keep-alive probes fail. (This assumesinternal_set_keepalive_on_connections
is set totrue
.) - Now
internal_set_keepalive_on_connections
can be disabled without having to restart the cluster. - Now prevent the LLVM Compiler Infrastructure from using AVX512 instructions which were causing code generation failures when MemSQL compiled queries.
- Fixed an issue that was causing
REBALANCE
to intermittently fail with a “missing BLOB file due to replication lag” error where the BLOB file was not actually missing. - Fixed an issue where replication connections were allowed to request blob files without proper authentication.
2019-01-14 Version 6.5.23
- Fixed an issue where BLOBs were not being restored during recovery of a secondary database.
- Fixed an issue where the state column in the
information_schema.PROCESSLIST
table was not being cleared properly after a query ended. - Added a system variable
auditlog_disk_sync
that delays when the audit log gets written to the disk. This delay improves the performance of audit logging. This system variable delays disk syncs by default (i.e. the default value isfalse
). - Fixed a master aggregator crash that occurred when running
REBALANCE ... FORCE
on a database with a table that has a shard key on a computed column. - Fixed an issue where zero-byte files were being created in the plancache directory during a hard shutdown such as a power outage.
- Fixed an issue when
sync_permissions
is enabled on DR secondary clusters where users could not log in. - No longer truncate the results of the
CURRENT_SECURITY_ROLES
function to 64 characters if used inside a sub-query. - Fixed an issue where the master aggregator incorrectly populated the database name in the
FILE
field of theinformation_schema.COLUMNAR_SEGMENTS
table. - Now generate an error when attempting to create an array having a negative size.
- Fixed optimizer error when using binary builtins.
- Fixed a crash when revoking privileges from the root user when
sync_permissions
is enabled. - Fixed a MySQL wire protocol incompatibility with the
FIELD_LIST
protocol command. Newer MariaDB clients were getting “lost connection” errors as a result of this incompatibility. - Reference tables now reserve
AUTO_INCREMENT
values in batches of 1000 instead of 1 million.
2018-12-03 Version 6.5.22
- Fixed a crash during code-generation when a large string is used inside an
IN
list. - Fixed a crash when querying
information_schema.ADVANCED_HISTOGRAMS
on a Disaster Recovery (DR) secondary cluster. - Fixed an issue where finishing the recovery of a secondary database in the middle of a columnstore transaction could cause log corruption if replication was stopped immediately after recovery finished and before replication connected.
2018-11-26 Version 6.5.21
- When MemSQL hits an out of memory error, a trace entry containing the query text of any query using more than 100 MB is now written to the tracelog to help diagnose what was using memory at the time of the error. Also included in each entry are the current query memory, average query memory, and activity name.
- Now block
SELECT
queries withoutFROM
clauses from compiling when MemSQL memory use is nearingmaximum_memory
. Historically,SELECT
queries with noFROM
clauses were allowed to compile even if memory use was low as they played a part in cluster health checks that run against leaves. This is no longer the case, so there is no reason to risk compiling them when memory use is close tomaximum_memory
. - Fixed tracking of missing blobs at the end of recovery. Previously, finishing recovery of a secondary database in the middle of a columnstore transaction could cause a missing blob to go untracked.
2018-11-13 Version 6.5.20
- Correctly block restoring an individual partition database backup on the master aggregator. Before this fix, the
RESTORE
command would fail but an empty reference database would be left behind that couldn’t be dropped. This fix also allows the left over empty reference database to be dropped if it had already been created before upgrading. - Updated Brazilian time zones due to recent changes made by the country of Brazil. Specifically
America/Campo_Grande
,America/Cuiaba
, andAmerica/Sao_Paulo
. Note AllBrazil/
time zones are deprecated. - The
CRC32
builtin function now returns results consistent with the crc32c algorithm when compared to the result produced by external libraries. - Added variable (
enable_broadcast_left_join
) to control whether the optimizer chooses the broadcast left join optimization.
2018-10-29 Version 6.5.19
- Sped up statistics metadata table lookups done by the query optimizer by using a new index.
- Fixed an issue with escaping database names that have SQL keywords in them when running
REBALANCE PARTITIONS
. - When disaster recovery replication is stopped, the remote name in
information_schema.distributed_databases
is now cleared. - Fixed an issue in the collation handling of the
LOAD DATA SET
clause which could cause theLOAD DATA
query to hang. - No longer allow long-running queries whose plans have been removed from the in-memory plan cache to block
DROP TABLE
on tables not involved in the long-running query. - Added a timeout to the lock taken on aggregators during a
REBALANCE
that blocks new write queries from starting. This timeout prevents long running write queries from blocking theREBALANCE
when theREBALANCE
is blocking new writes from starting. - No longer allow replica partitions that are missing blob files due to replication lag to be promoted to master partitions. This fix prevents errors such as “Can’t open file: ‘columns/db/5/20575/3921917’” after a failover event.
- Now upgrade pipeline metadata for files no longer in the source after upgrade.
- Addressed noisy-neighbor issue with cluster database replication when a node on the cluster, or network, slows down replication.
- Fixed wrong result issues with columnstore reference tables used in left joins against sharded tables.
2018-10-08 Version 6.5.18
- Fixed a crash caused by granting permissions to a
GROUP
using only the table name in theGRANT
without a context database set on the connection running the grant (i.e., noUSE <db>
had been run on the connection). - Reduced the amount of data written to the cluster database during a failover if a node in the cluster is slow to replicate the new location of master partitions.
- Now an application cannot to use up all connection threads (up to
max_connection_threads
) on a leaf node when running an online failover (ie., during aREBALANCE PARTITIONS
operation). There are periods of time when MemSQL blocks running queries on a leaf when running a failover. This is because MemSQL requires some connection threads to run the failover, and this change ensures the query workload cannot use them all. - Now allows system variables to be used in computed columns again. They were blocked in earlier versions of MemSQL 6.5.
- Added the following
CONFIG
knobs toKAFKA
pipeline. Kafka metadata queries will be retried up to three times."metadata.request.timeout.ms"
: Non-topic request timeout in milliseconds. This is for metadata requests, etc."topic.metadata.refresh.fast.interval.ms"
: When a topic loses its leader a new metadata request will be enqueued with this initial interval, exponentially increasing until the topic metadata has been refreshed. This is used to recover quickly from transitioning leader brokers."topic.metadata.refresh.interval.ms"
: Topic metadata refresh interval in milliseconds. The metadata is automatically refreshed on error and connect. Use -1 to disable the intervalled refresh.
2018-09-17 Version 6.5.17
- Fixed an issue with char(0) columns on columnstore.
- Can now restore a backup with user-defined functions (UDFs) into a database with a different name.
- Fixed leaf syntax error that can occur when type casts are internally added to variables in full text search queries.
2018-09-13 Version 6.5.16
- Updated Kerberos dependencies (libkeyutils) on Debian-based platforms.
2018-09-12 Version 6.5.15
- PL variables can now be used in full text search queries which define the body of a query type variable.
- Now allow cross-cluster replication between a MemSQL 6.0 primary and MemSQL 6.5 secondary clusters.
2018-09-10 Version 6.5.14
- Now allow
DROP PARTITION
to drop offline partitions. MATCH
andHIGHLIGHT
full text search functions now work with PL parameters inside stored procedures.- Fixed an issue with
IN
lists and shuffle group by. - Allow upgrade to 6.5 for
VARBINARY
columns with max length greater than 64k. - Moved sasl module libraries into a separate folder under objdir to eliminate syslog warnings.
2018-08-22 Version 6.5.12
- Now support
PIVOT
in views. - Made ROLLBACK of transactions involving reference tables in stored procedures work as expected.
- Fixed a crash when certain information_schema queries were hinted through the
leaf_pushdown_default
variable. - Improved out-of-memory handling in
ANALYZE
.
2018-08-14 Version 6.5.11
- First boot after upgrade from 5.8 to 6.5 will not reuse auto increment values, even if they were deleted.
2018-08-06 Version 6.5.10
- Fixed an issue where clusters with between 10 and 30 nodes couldn’t have disaster recovery replicas.
- Changed the default
pipelines_kafka_version
back to 0.8.2.2 (The MemSQL 6.0 default). Using a higher default version with older Kafka servers was causing performance issues. - Fixed an issue with
AUTO_INCREMENT
values getting reset to 0 after an offline upgrade.
2018-07-24 Version 6.5.9
- Initial GA release of MemSQL 6.5.
- Fixed an issue where running management view queries caused ALTER TABLE to be unresponsive.