You are viewing an older version of this section. View current production version.
7.0 Release Notes
The main features of the MemSQL 7.0 release are highlighted below.
Release Highlights
-
Much improved performance of synchronous replication and durability. Our synchronous replication now has only a minor performance penalty compared to our already fast asynchronous replication – giving you better data consistency guarantees without incurring large performance penalties. You can read more about our improved synchronous replication here.
-
Selective queries on columnstore tables now run faster, via sub-segment access.
-
Hash indexes on columnstore tables are now supported.
CREATE TABLE
andALTER TABLE
now support adding and dropping hash indexes on columnstore tables. -
By default,
UPDATE
andDELETE
queries now lock columnstore tables at the row level, allowing improved concurrency. Previously, these queries locked columnstore tables on the segment level. -
Sparse rowstore compression is now supported.
-
Queries on columnstore tables can now reorder filters to decrease execution time. Reordering occurs automatically and allows filters that are more selective to be evaluated first.
To deploy this release, follow the appropriate guide.
To upgrade a self-managed install to this release, follow this guide.
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
In addition to the release highlights noted above, the following new features and improvements are available in this release:
Query Execution
- Improved Workload Management implementation that no longer needs to be tuned via the
workload_management_expected_aggregators
engine variable; this variable is now deprecated. - Support for multi-statement transactions that use distributed joins.
- Support for multi-statement transactions that write to a database and then read from a different database.
- Added a session variable
node_degree_of_parallelism
that specifies the number of threads per node to use for columnstore table scans. - Queries that use the
IN
filter on shard keys now perform better. - Hash joins that use reference tables now perform better.
- The
data_conversion_compatibility_level
engine variable can now be set to7.0
. This allows errors to be thrown for more invalid type conversions, instead of doing implicit type conversions, where the user could see unexpected results. - Now, a maximum of 1024 tables can be written to in a transaction.
Query Optimization
- Column statistics are now automatically gathered on rowstore tables. This is in addition to column statistics being automatically gathered on columnstore tables, which was done previously.
- Range statistics (histograms) are now automatically gathered on rowstore and columnstore tables.
- Rowstore table sampling has been improved.
- Now, a true row-level random sample of rows from columnstore tables is maintained automatically and used to estimate the selectivity of complex predicates.
- All shapes of bushy joins are now supported.
- Common Table Expressions (CTEs) can now be materialized; redundant expressions can use cached data instead of being recomputed.
Data Storage, Loading, and Backup
- Performance improvements for bulk inserts, large insert/selects,
LOAD DATA
, and pipelines to columnstore tables (particularly for wide columnstore tables). - Added support for publishing data to Kafka via SELECT … INTO KAFKA ….
- Added support for Parquet in
CREATE PIPELINE
. - Improved error reporting for pipelines and
LOAD DATA
. - Incremental backup for columnstore data.
System Management
- Added a column to
information_schema.TABLE_STATISTICS
to indicate if a table is rowstore or a columnstore. - Added columns to the
MV_QUERY_ACTIVITIES
andMV_QUERY_ACTIVITIES_EXTENDED_CUMULATIVE
views. - Added the
MV_GLOBAL_STATUS
,MV_GLOBAL_VARIABLES
,MV_SYSINFO_CPU
,MV_SYSINFO_CPU_DISK
,MV_SYSINFO_CPU_LIST
,MV_SYSINFO_MEM
, andMV_SYSINFO_NET
views.
Functional Extensions
- Cross-database views are now supported.
- Added built-in support for time series reporting through the new
FIRST
,LAST
, andTIME_BUCKET
functions. GROUP_CONCAT()
now supports theORDER BY
clause.- Added the
MEDIAN()
aggregate function.
Maintenance Release Changelog
2020-12-07 Version 7.0.24
- Fixed a crash caused by timestamp values during daylight savings time changes in columnstore tables using a timestamp as a sort key or unique key.
- Fixed a crash that occurred when querying
information_schema.COLUMNAR_SEGMENTS
,information_schema.MV_COLUMSTORE_FILES
, orinformation_schema.MV_COLUMNAR_SEGMENT_INDEX
whileALTER TABLE … DROP INDEX …
is being run concurrently in order to drop a secondary columnstore index. - Added support for audit logging levels other than
ALL-RESULTS
mode when clients connect using server side prepared statements. Previously, enabling audit logging disabled use of prepared statements. - Fixed a crash that occurred when
ANY_VALUE(NULL)
is included in a query. - Fixed a crash that occurred where a query-typed variable accesses a table via a 3-part name (for example,
dbname.tablename.columnname
) and the database referred to does not exist. Blocked the creation of pipelines which refer to computed columns or columns set via a default value in theWHERE
clause. - Snapshots of the memsql system database are now taken more frequently. Previously, the memsql database was slower to recover on higher latency disk than user databases; more frequent snapshots speed up restart times.
2020-10-19 Version 7.0.23
- The
CHECK BLOB CHECKSUM
command now supports aWITH REPAIR
option that can repair file system corruptions as long as one copy of a corrupt columnstore blob file exists on the cluster. See more information in the CHECK BLOB CHECKSUM topic. - Fixed an issue that occurred when using server side prepared statements via the newer versions of the MariaDB client. The MySQL client is unaffected.
- Fixed a wrong result issue when using
IN
with a columnstore secondary hash index, which caused only the first of multiple values with the same hash in a segment to be output. - Fixed wrong result when using
GROUP_CONCAT
withORDER BY
in query type variables.
2020-08-24 Version 7.0.22
- Fixed an issue where
REPLICATE DATABASE
would fail in cases where no replicated database existed yet and the node containing the master aggregator was previously replaced with a new node. - Now,
SHOW PROCESSLIST
returns the IP address of the client (in the Host column) if the host name cannot be resolved. - Fixed an issue that occurred when the engine variable
interpreter_mode
was set tointerpret_first
, where the first run of the query remained in interpreted mode even after compilation had finished. - Added support for
ALTER PIPELINE ... SET RESOURCE POOL
. - Reduced the CPU usage required for many threads to allocate memory for variable length strings.
- Added a new engine variable
convert_nonunique_hash_to_skiplist
which, when set toON
, creates a skiplist index in place of any non-unique hash index. By default, the variable is set toOFF
. - Pipelines no longer stop executing if free disk space drops lower than
5 * <the value of the minimal_disk_space engine variable>
. They now stop executing if free disk space drops lower than400 mb + <the value of the minimal_disk_space engine variable>
. - Improved the error message returned when
REMOVE AGGREGATOR
is run on the master aggregator. - The behavior of the
max_compilation_time_s
engine variable no longer applies to DDL queries. Eliminating compilation timeouts for these queries, especially forALTER TABLE
, will prevent the queries from potentially running forever. - Fixed a compatibility issue that occurred when running newer versions of SAP Business Object Data Services (BODS) against MemSQL. The issue was that the command
SET sql_mode = <expression>
was not supported by MemSQL. - Fixed an issue where auto-attach would fail to attach a partition but succeed in bringing the leaf online, resulting in offline partitions.
- Ensured that any re-used existing replica partition is still forced in sync as part of the copy operation run as a step of a
REBALANCE
operation. Previously, this was a cause of failure in cases whereREBALANCE
failed once prior. - Fixed a failure when running a
ROLLUP
query over a constant expression. - Disallowed parameter names with
.
in them when defining a function or stored procedure.
2020-07-20 Version 7.0.21
- Fixed an issue where joins with reference tables are sometimes executed using Cartesian joins, even when faster joins are possible.
- Improved the performance of rebalance partition operations that copy partitions with many small columnstore BLOB files.
- Fixed an issue where the engine variable
explicit_defaults_for_timestamp
could be set to a different value on new nodes added to the cluster, as compared to the existing value already set on existing nodes. This would cause the error “partition’s table metadata are out of sync” when queries were run. ANALYZE TABLE
now requires theSELECT
and (INSERT
orALTER
) permissions to execute.- Improved the performance of batch deletes on columnstores. The increase in performance depends on the encoding of the columns.
- When
preserve_original_colstore_json
is set toON
, null values and empty arrays are preserved in JSON columns in columnstores.
2020-06-29 Version 7.0.20
- Fixed an issue where user-defined tables having the same name as system tables, such as
USERS
, were treated as case-sensitive whentable_name_case_sensitivity
was set toOFF
. - SSL ciphers that use elliptic curves are now supported.
- Added an optimizer setting that allows you to adjust the cost of performing a cross join.
- The
STATE
column in theinformation_schema.PIPELINES
view now shows a pipeline in an error state when the pipeline stops due to insufficient disk space. - Fixed an issue where running
STOP REPLICATING
would sometimes cause replica partitions, after becoming primary partitions, to replicate asynchronously when the original primary database used synchronous replication. - Improved predicate transitivity rewrites to avoid adding redundant predicates. This fixes compile timeouts for some queries with large numbers of predicates.
- Fixed an issue where filters were sometimes not being used as hash join conditions if they involved expressions. This was causing filters to be evaluated on more rows than needed.
2020-06-09 Version 7.0.19
- Correctly generate an error when running
ALTER VIEW
on a schema-bound view v1, where a schema-bound view v2 refers to v1. - Fixed an auto-attach failure in specific failure conditions where a node quickly got marked offline and then online by the aggregator, but the node itself did not restart.
- Improved the parallelism of heartbeat reconnects. This fixed erroneous node failures in cases of issues causing slow connections, when multiple nodes were unhealthy.
- Fixed a crash where
VALUES
clauses inINSERT ... SELECT ... ON DUPLICATE KEY UPDATE
referred to tables not being written to.
2020-05-26 Version 7.0.18
- Added the engine variable
json_extract_string_collation
. When the variable is set toAUTO
(the default setting) orJSON
, the extracted string has the same collation as the JSON object that is being extracted from. When the variable is set toserver
, the extracted string has the same collation as thecollation_server
variable. - Reduced memory usage in certain outer join queries that use single table or constant filters.
- Fixed an issue where views using set operations (such as
INTERSECT
andMINUS
) could be parsed incorrectly when being queried. - Fixed an issue with
INSERT .. SELECT
queries that use generator functions (functions that are computed on the aggregator, with the results sent to the leaves). - Fixed an intermittent incorrect permission denied error that occurred when views were used in queries on secondary clusters.
- Improved the performance of the query that is run on
information_schema.OPTIMIZER_STATISTICS
by MemSQL Ops schema monitoring.
2020-04-27 Version 7.0.16
- Added the
innodb_strict_mode
engine variable. Thedjango-mysql
library uses the value of this variable to connect to MemSQL. - Improved the efficiency of locking done by the security manager to allow more concurrent login/logout operations.
- Added an error message to return when a command fails due to a bug: “Operation
<operation name>
failed without specifying a detailed error message”. - Fixed an issue where the security checks done for temporary tables could result in an incorrect access denied error.
- When external authentication to MemSQL (via PAM, Kerberos, etc.) is slow, track the time needed to authenticate and add logging to the tracelog. This is done for troubleshooting purposes.
- Now, the
RESTORE
command can restore a 6.x backup to a later version, using a different database name than was specified when the database was backed up. - Fixed an issue where views could not be queried after being created, due to a parsing error.
2020-04-06 Version 7.0.15
- Fixed an issue where synchronous replication incorrectly consumed all of the disk space on the host containing either the master or replica partitions. The issue occurred when the host containing the replica partitions had a specific type of network connectivity problem with the host containing the master partitions.
- Fixed an issue that could cause an online upgrade to fail if a reference database needs to reprovision in the middle of the upgrade.
- Fixed an issue where incorrect reuse of query plans stored on disk could cause the error “No function with id 23003 exists”.
- You can specify that the
SHOW VARIABLES
command returns only cluster-wide variables. - Fixed a bug where dropping a secondary index on a columnstore table and then adding another secondary index on the table (via
ALTER TABLE
) causes the secondary indexes on the table to become corrupted after a node restart. - Fixed a bug where using
ALTER TABLE
to add a secondary index to a columnstore table, truncating the table, and then restarting the node would cause the added secondary index to be disabled and alters on the table to crash the node. - Added Google Cloud Storage (GCS) pipelines to the
information_schema.PIPELINES
view. - Added the
histogram
column to theinformation_schema.COLUMN_STATISTICS
view. - Added the
path
column to theinformation_schema.PLANCACHE
view. This column contains the path to the file where a query plan is stored on disk.
2020-03-23 Version 7.0.14
- Avoid inaccurate sampling estimates for expressions that use
CURRENT_SECURITY_ROLES()
. - Fixed a crash that occurred when
PERCENTILE_DISC()
andPERCENTILE_CONT()
were used with a sub-select. - Added
int96
support to Parquet pipelines. - When an
EXPLAIN
query runs, return a warning when the query is run on tables containing zero rows. - Fixed a crash that occurred during query rewrites for complex
UPDATE
andDELETE
queries having subselects. - Fixed an error that occurred when a plan was dropped from the plancache during a distributed join.
- Added clearer error messages when queries are unable to run due to workload manager or resource governor settings. Also, added better logging for these errors.
- Queries that are built from
TO_QUERY()
and query type values are parameterized when they run. Parameterization of these queries allows plans to be reused in the plancache. - Added support for pipelines that extract data from Google Cloud Storage (GCS).
- Fixed a locking issue that could cause dropping a temporary table to fail.
2020-03-09 Version 7.0.13
- Now, queries inside of stored procedures that use variables run as single partition queries if the queries only need to access data within single partitions. Previously, these types of queries did not run as single partition queries.
- Fixed an issue where
CLEAR ORPHAN DATABASES
did not handle table names that require escaping, such as table names prefixed with#
. - Now,
ANALYZE TABLE
incorporates the behavior of thenode_degree_of_parallelism
engine variable. - Now, call the
malloc_trim()
function less aggressively. This function shrinks the memory cached by libc malloc. - Added the columns
STORAGE_TYPE
andDISTRIBUTED
toinformation_schema.TABLES
. These columns were previously returned when runningSHOW TABLES
, only. - Now, improve the performance of queries that use hash outer joins with some types of table filters.
2020-02-18 Version 7.0.12
- Fixed an incorrect “Transaction rolled back mid-query” error that could occur during a multi-statement transaction if the transaction was started on at least 64 partitions.
- Fixed an error that occurred when
SELECT ... GROUP BY ... ORDER BY RAND()
was run on a columnstore table. - Now, increase the number of columns that
SELECT
queries can return. - Removed a limitation on the number of columns in a table allowed for columnstore sampling. Increased the number of keys allowed for optimized columnstore JSON storage in some cases.
- Fixed a deadlock that is possible when the
RESTORE DATABASE
command fails and attempts to remove the restored database partitions that were created before the command failed. - Now, cluster-wide operations that take a global lock, such as
ADD LEAF
andREBALANCE
, update theSTATE
column ofinformation_schema.PROCESSLIST
. The update indicates that a command is blocked on the lock.
2020-01-27 Version 7.0.11
- Added the
columnstore_sample_per_partition_limit
engine variable. This variable controls the maximum number of rows sampled per partition for columnstores. - Now, stop examining samples of columnstore tables that have many columns. Continue collecting samples for these types of tables.
- Fixed an issue with kerberos-related
CONFIG
clause keys inCREATE PIPELINE ...
being considered invalid by HDFS pipelines; added theallow_unknown_configs
option to such pipelines to skipCONFIG
clause validity checks. - Now, periodically shrink the memory cached by libc malloc by calling
malloc_trim()
. - Now, the
table_name_case_sensitivity
engine variable can be used to change whether MemSQL treats tables, views, and table aliases as case-sensitive or case-insensitive. This variable can only be changed if no user databases have been created on the cluster. - Fixed a crash that occurred when running
UPDATE ... SET ...
which sets a sparseVARCHAR
column that had been previously assigned to earlier in the sameUPDATE ... SET ...
query. Fixed the same issue that occurred when setting a sparseDECIMAL
column. - Fixed an issue where cross-database distributed joins could fail to cleanup the temporary result tables created to run the join.
2020-01-13 Version 7.0.10
- Now, the
REPLICATE DATABASE
command requires a primary database and its secondary database to have the same durability setting (SYNC
orASYNC
). - Fixed an issue that caused
STOP REPLICATING
to encounter an error after a database was upgraded from MemSQL 6.x to MemSQL 7.0. - Now,
LOAD DATA
incorporates the behavior of theignore_insert_into_computed_column
engine variable. - Fixed a crash that sometimes occurred when calling the
MAX
function on anENUM
column. - Now, HDFS Pipelines skip files in the data source that are under the
_temporary
directory. - Added the clause
NULL DEFINED BY ... OPTIONALLY ENCLOSED
to the syntax ofLOAD DATA ...
. When this clause is used,LOAD DATA
countsNULL
values enclosed with quotes asNULL
s. - Now, snapshot reference databases after
ALTER TABLE
is run. Also, snapshot all databases afterTRUNCATE
is run. - Fixed a memory leak in queries that use full-text indexes.
- Fixed an issue with database replication that occurred when the
snapshots_to_keep
engine variable is set to 1. - Now, delete the existing samples associated with a table when columnstore sampling on the table is disabled. Also, reset the sampling flag on all segments associated with the table so they will be resampled if columnstore sampling is re-enabled.
2019-12-10 Version 7.0.9
- Initial GA release of MemSQL 7.0