Outdated Version
You are viewing an older version of this section. View current production version.
6.8 Release Notes
The primary improvements in the MemSQL 6.8 release are enhancements to HDFS pipelines security and improved ad hoc (first run) query performance.
See the descriptions and changelog below for more information on these new features, as well as the other new features in the 6.8 release.
Data Loading
- Advanced HDFS Pipelines that import data from HDFS using Kerberos and wire encryption.
- Performance improvements to Pipelines extractors
- Added
TRAILING NULLCOLS
andNULLS DEFINED BY
clauses toLOAD DATA
andCREATE PIPELINE
Query Optimization, Compilation, and Statistics
- Can speed up ad-hoc (first query) run times by having queries interpreted and then dynamically compiled during first query execution. This behavior was available as experimental setting in MemSQL 6.7, but is production-ready and enabled by default in 6.8. See the interpret_first documentation for more details.
- Performance improvements for some outer, anti-, and semi-joins
- Performance improvements to
ROLLUP
andCUBE
Resource Governor Improvements
LOAD DATA
operations run in the resource pool for the current connection where the load operation is running- Stored procedures run in the resource pool of the current connection from where they are called
- Query optimization always runs in the resource pool of the current connection
Functional Extensions
- A subquery does not require an alias, assuming that removing the alias doesn’t create ambiguity
- Increased the precedence of
||
as concat (undersql_mode
=PIPES_AS_CONCAT
) to be compatible with MySQL and Postgres - Added
SPLIT()
andJSON_TO_ARRAY()
- Enabled
TO_DATE()
to support format strings with time-related format options (HH
,SS
, etc.) - Enabled
TO_DATE()
to support theDD
format option - Enabled
TO_TIMESTAMP()
to supportYY
andFF
format options - Enabled
TO_TIMESTAMP()
,TO_DATE()
, andTO_CHAR()
to support theD
format option - Enabled
TO_TIMESTAMP()
andTO_DATE()
to support using different punctuation as separator - Enabled
TO_TIMESTAMP()
andTO_DATE()
to raise an error instead of returningNULL
for certain error cases - Enabled
TO_CHAR()
to supportAM
andPM
- Modified how
TO_TIMESTAMP()
parses12
when using theHH
format option
Maintenance Release Changelog
2020-07-20 Version 6.8.24
- Fixed an issue where joins with reference tables are sometimes executed using Cartesian joins, even when faster joins are possible.
- SSL ciphers that use elliptic curves are now supported.
- Fixed errors that occurred when running
STOP REPLICATING
on a loopback (self replicating) secondary database.
2020-07-15 Version 6.8.23
- When
preserve_original_colstore_json
is set toON
, null values and empty arrays are preserved in JSON columns in columnstores.
2020-06-22 Version 6.8.22
- Reduced memory usage in certain outer join queries that use single table or constant filters.
- Changed the
QUERY_TEXT
column of theinformation_schema.MV_QUERIES
view to use theLONGTEXT
data type. - Improved predicate transitivity rewrites to avoid adding redundant predicates. This fixes compile timeouts for some queries with large numbers of predicates.
- Added an optimizer setting that allows you to adjust the cost of performing a cross join.
2020-05-18 Version 6.8.21
- Now, a secondary cluster can be brought online sooner, following a restart of the nodes in the cluster.
- Fixed an incorrect permissions denied error that occurred when a newly created user ran a query against a temporary table that already existed in the plancache before the user was created.
- Fixed an issue where queries, which both applied partitioning/sorting to window functions and used certain right joins, would crash or return incorrect results.
2020-04-07 Version 6.8.20
- Added the columns
STORAGE_TYPE
andDISTRIBUTED
toinformation_schema.TABLES
. These columns were previously returned when runningSHOW TABLES
, only. - Avoid inaccurate sampling estimates for expressions that use
CURRENT_SECURITY_ROLES()
. - Fixed a locking issue that could cause dropping a temporary table to fail.
- The
SHOW METADATA
permission allows a user to see all rows ininformation_schema.DISTRIBUTED_PARTITIONS
. - Added the
force_bushy_joins
engine variable that controls the use of bushy joins by the query optimizer. - When an
EXPLAIN
query runs, return a warning when the query is run on tables containing zero rows. - Added the
path
column to theinformation_schema.PLANCACHE
view. This column contains the path to the file where a query plan is stored on disk. - Fixed an issue where the
information_schema.TABLE_STATISTICS
could not be read if the cluster had offline partitions. - Show a warning in
EXPLAIN
output if a query exceeds thedistributed_optimizer_max_join_size
number of tables allowed for a complete join optimization.
2020-03-17 Version 6.8.19
- Fixed a crash that occurred during query rewrites for complex
UPDATE
andDELETE
queries having subselects. - Queries inside of stored procedures that use variables run as single partition queries if the queries only need to access data within single partitions. Previously, these types of queries did not run as single partition queries.
- Fixed a deadlock that is possible when the
RESTORE DATABASE
command fails and attempts to remove the restored database partitions that were created before the command failed. - No longer allow multiple leaf nodes to be added using host names or IPs that resolve to the same machine.
- Fixed an issue where specific queries would not properly account for the number of available threads.
- Fixed an error that occurred when a plan was dropped from the plancache during a distributed join.
2020-02-18 Version 6.8.18
- Now, increase the number of columns that
SELECT
queries can return. - Fixed an error that occurred when
SELECT ... GROUP BY ... ORDER BY RAND()
was run on a columnstore table. - Fixed an issue where cross-database distributed joins could fail to cleanup the temporary result tables created to run the join.
- Fixed a hang that occurred during an
ALTER TABLE
if more thanmax_connection_threads
queries are blocked from running on a child aggregator during the phase ofALTER TABLE
when queries are blocked. - Now, improve the performance of queries that use hash outer joins with some types of table filters.
2020-01-21 Version 6.8.17
- Now, HDFS Pipelines skip files in the data source that are under the
_temporary
directory. - Now,
LOAD DATA
incorporates the behavior of theignore_insert_into_computed_column
engine variable. - Added the clause
NULL DEFINED BY ... OPTIONALLY ENCLOSED
to the syntax ofLOAD DATA ...
. When this clause is used,LOAD DATA
countsNULL
values enclosed with quotes asNULL
s.
2019-12-16 Version 6.8.16
- Fixed a memory leak in queries that use full-text indexes.
- Fixed an issue with a specific
INSERT ... SELECT ... UPDATE
query that caused a crash instead of returning an error. - Fixed an issue that may cause pipeline extractor subprocesses to hang.
- Fixed a crash that occurred when
ORDER BY RAND()
was used in a right join or a full join. - Added the
compatibility_mode
option to the S3 backup functionality. This enables an S3 backup to be used with Google Cloud Storage and other third-party compatible APIs.
2019-12-09 Version 6.8.15
- Fixed hung queries that resulted from invalid UTF8 characters.
- Fixed an issue that occurred when assigning the result of
SPLIT
to variable that isn’t an array of strings. Now, the result is type casted to an array of strings, orSPLIT
returns an error when type casting isn’t possible. - Fixed an issue that could occur while replicating user/grant metadata within a cluster. The issue could cause incorrect permission checks if the
sync_permissions
engine variable is enabled.
2019-12-02 Version 6.8.14
- Now, you can run
CREATE TEMPORARY TABLE
concurrently, on multiple connections, against the same table. The same applies toDROP TEMPORARY TABLE
. - Added support for some
SHOW CHARACTER SET WHERE ...
clauses. This is needed for MySQL Workbench 8.0 support. - Now,
ALTER TABLE
statements that add computed columns will no longer hang if the computed column’s expression uses a built-in function that throws an error during execution.
2019-11-11 Version 6.8.13
- Fixed issues with the
information_schema.MV_SYSINFO
tables used by MemSQL Studio to manage CPU usage of running processes. - Added the
performance_schema
read-only engine variable. This allows newer MySQL Connector/J and JDBC drivers to connect to MemSQL. - Now,
ALTER TABLE
andTRUNCATE TABLE
only wait for garbage collection to complete if a previous operation triggered garbage collection on the same table. Previously,ALTER TABLE
andTRUNCATE TABLE
waited for garbage collection to complete on any table. - Now, improve performance of garbage collection on the plancache, when the plancache is large. Garbage collection on the plancache is triggered when the
plan_expiration_minutes
engine variable expires.
2019-11-04 Version 6.8.12
- Fixed an issue causing the workload manager to share the same queue for memory intensive, connection intensive, and thread intensive queries. Now, the workload manager uses a queue for memory intensive queries and another queue for connection and thread intensive queries.
- Fixed a crash that occurred when querying the
information_schema.PIPELINES_ERRORS
table. - Fixed an issue that caused resource pools to gather incorrect memory use statistics for some queries.
- Fixed a crash that occurred when a stored procedure used by a pipeline writes to a different database than where the pipeline is located.
2019-10-21 Version 6.8.11
- Fixed an issue where running
SHOW PLANCACHE
or queryinginformation_schema.PLANCACHE
could cause a server crash if the engine runs out of memory while performing these operations. - Fixed an issue where running
KILL QUERY
on anALTER VIEW
query could cause new queries accessing the view to go into an infinite loop. - Reduced the query compilation time for some complex queries.
- Now,
SET PASSWORD
can be used to set the password for theroot
user. This applies when thesync_permissions
engine variable is set toON
. - Fixed an issue where no error was returned when large negative row-ids were present in an internal table. These row-ids are used when loading data using pipelines.
- Fixed an out-of-memory issue that could occur in queries with multiple subselects and a
WHERE
clause containing a constant predicate. - Fixed a crash that occurred when running a
REVOKE
statement containing anON
clause without a database name. - Fixed an out-of-memory issue that occurred during the compilation of query shapes having many unions. These query shapes use many parameters.
2019-09-16 Version 6.8.10
- Now, report an error when running out of memory while allocating stack space. This applies when
interpreter_mode
is set tointerpret_first
. - Added support for server-side encryption when backing up to S3.
- Added the
max_async_compilation_concurrency
engine variable to limit the number of compiles that can run at the same time wheninterpreter_mode
is set tointerpret_first
. Also, changed the default number of concurrent async compiles to half the number of cores on the host machine. It was previously set to the number of cores on the host machine. - Added the
information_schema.MV_WORKLOAD_MANAGEMENT_STATUS
table, which shows the status of workload management. - Added an optimization for
SELECT
queries that have specific shapes. The optimization allows these queries to compile faster. - Added the
max_compilation_time_s
andmax_compilation_memory_mb
engine variables. The former is the timeout for compiling a query. If the timeout is reached, an error is returned. The latter is the maximum amount of memory used to compile a query. If this maximum is reached, an error is returned. - Added the
information_schema.LMV_ASYNC_COMPILES
table, which shows the async compiles that are running or queued. Also, added theKILL QUERY COMPILE
command that allows you to kill compiles in theLMV_ASYNC_COMPILES
table.
2019-09-03 Version 6.8.9
- Fixed an issue where
TRUNCATE TABLE
caused unrecoverable replica partitions. - Now, allow a Kafka pipeline to continue running if the latest batch partition offset is lower than the earliest offset.
- Fixed the permission check on
CREATE TABLE ... SELECT
queries to correctly check for the SELECT permission type on the tables being selected. Prior to the fix, the tables being selected were checked for the permission type required by the previous compiled query. - Fixed an issue where the
TOTAL_QUEUE_TIME
field ininformation_schema.MV_RESOURCE_POOL_STATUS
would become negative. - Now, do not allow the setting
MAX_QUEUE_DEPTH
to be specified in a resource pool when theMAX_CONCURRENCY
setting is not specified in the resource pool. - Now, reduce the amount of tracing done by
EXPLAIN CLEAR ORPHAN DATABASES
. - Fixed an issue where a user could run
GRANT <privileges> ON <resources> TO <role>
if the user didn’t have the privileges being granted. Now, this GRANT query fails if the user doesn’t have these privileges. - Fixed an issue where cluster-wide permissions could be granted, at the database level, to roles. These permissions can no longer be granted at the database level.
2019-08-19 Version 6.8.8
- Now, improve the performance of restoring a database that contains many small columnstore BLOB files.
- Fixed a crash that occurred when
LOAD DATA
encountered an error on a line of input whose text is larger than 100 MB. - Fixed an issue where file descriptors were sometimes left open when processes external to the engine (such as pipeline processes) encounter errors.
2019-08-12 Version 6.8.7
- Added
information_schema
tables containing theMV_GLOBAL
prefix. These tables can be queried to view global variable settings and server status information. - Fixed an issue where a global sync variable that was set on the master aggregator would not sync to other nodes in the cluster.
- Changed how successive row-id values are generated in an internal system table. Now, these values are more likely to be the same. These values are used when loading data using pipelines.
2019-08-05 Version 6.8.6
- Added
information_schema
tables containing theMV_SYSINFO
prefix. These tables can be queried to monitor the CPU, memory, disk, and network usage of the cluster. - Added the
information_schema.MV_PROCESSLIST
table. This table can be queried to view the queries that are currently running in the cluster. - Now, improve the performance of
LOAD DATA
when it loads a file from a directory that contains many files. - Fixed a bug where memory usage being tracked by resource pools is negative. Also, corrected some tracelog messages regarding resource pool memory tracking.
2019-07-29 Version 6.8.5
- Fixed an issue that caused user defined function (UDF) security checks to be done on leaf nodes instead of on the aggregator. This would cause a permission denied error because users created by a client application don’t exist on leaf nodes.
2019-07-22 Version 6.8.4
- Removed some unneeded memory allocations that were being done before running parallel queries. This addresses the high CPU usage that sometimes occurred when running these queries.
- Now, clear the DNS cache on secondary cluster nodes when nodes are modified on the primary cluster.
- Fixed an issue where
ALTER PIPELINE
would go into an infinite loop. - Fixed a rarely occurring issue where resource pools were incorrectly tracking memory usage.
- Fixed an issue where the use of AVX2 and SSE2 by
LOAD DATA
was disabled. - Now, if there are too many threads waiting for child threads to be scheduled, switch to non-parallel execution if possible. Fail the query otherwise. Previously, there was a thirty second timeout for launching child threads. This is too long of a wait for some workloads.
- Now, remove MemSQL’s dependency on the
libnsl.so.1
library to make it easier to deploy MemSQL on flavors/versions of Linux such as RHEL 8 that don’t havelibnsl.so.1
installed by default. - Fixed an issue where the constant folding optimization was disabled in edge cases. The fix speeds up queries that were affected by the issue.
- Now, address a deadlock that is possible during
REBALANCE PARTITIONS
if the child aggregator runs out of connection threads. - Fixed a syncing issue seen when files were written to the plancache during code generation. This issue caused code generation failures and unrecoverable databases.
- Now, disable correlated subselects when using
ROLLUP()
andCUBE()
, as it is currently unsupported. Also, fixed an issue where an optimization incorrectly reordered aGROUP BY
and join whenROLLUP()
orCUBE()
was being used. - Added new configuration settings to Kafka pipelines to support explicit Kerberos credential caches (
sasl.kerberos.ccache
) and disabling of kinit (sasl.kerberos.disable.kinit
). - Now, a pipeline waits for the reference database to be in sync before the pipeline compiles.
2019-06-24 Version 6.8.3
- Fixed an issue when
interpreter_mode
is set tointerpret_first
, where incorrect code was generated. This issue could cause a crash or an infinite loop. - Fixed an issue where
memsqlctl
wouldn’t show license details.
2019-06-10 Version 6.8.2
- Fixed an issue that was causing high system CPU usage. This issue occurred when running context switch heavy workloads. These workloads contain many fast executing queries that run against multiple connections.
- Now, a leaf node fails over to its pair if, for sixty seconds, the leaf has all of its threads running queries (as specified by
max_connection_threads
) and is not maintaining a minimum throughput of five queries per second. - Now, a leaf node fails over to its pair if the leaf’s disk usage falls below the value of
minimal_disk_space
. - Fixed an issue where
LAST_INSERT_ID(<expression>)
was returning an incorrect value. - Now, security management commands such as
CREATE GROUP
,CREATE ROLE
andCREATE USER
can be run inside of stored procedures. - Now, improve the performance of
SELECT ... INTO S3
. - Now, the value of the
SIZE
field ininformation_schema.MV_BACKUP_HISTORY
is stored in megabytes instead of bytes. - Fixed an issue where a user authenticated with Pluggable Authentication Module (PAM) could not run
ANALYZE TABLE
. - Fixed an issue where the bootstrap aggregator would not work on machines where the only configured address is the loopback address.
2019-05-14 Version 6.8.1
- Initial GA release of MemSQL 6.8