Outdated Version

You are viewing an older version of this section. View current production version.

7.0 Release Notes

The main features of the MemSQL 7.0 release are highlighted below.

Release Highlights

  • Much improved performance of synchronous replication and durability. Our synchronous replication now has only a minor performance penalty compared to our already fast asynchronous replication – giving you better data consistency guarantees without incurring large performance penalties. You can read more about our improved synchronous replication here.

  • Selective queries on columnstore tables now run faster, via sub-segment access.

  • Hash indexes on columnstore tables are now supported. CREATE TABLE and ALTER TABLE now support adding and dropping hash indexes on columnstore tables.

  • By default, UPDATE and DELETE queries now lock columnstore tables at the row level, allowing improved concurrency. Previously, these queries locked columnstore tables on the segment level.

  • Sparse rowstore compression is now supported.

  • Queries on columnstore tables can now reorder filters to decrease execution time. Reordering occurs automatically and allows filters that are more selective to be evaluated first.

Info

To deploy this release, follow the appropriate guide.

To upgrade a self-managed install to this release, follow this guide.

To make a backup of a database in this release or to restore a database backup to this release, follow this guide.

In addition to the release highlights noted above, the following new features and improvements are available in this release:

Query Execution

  • Improved Workload Management implementation that no longer needs to be tuned via the workload_management_expected_aggregators engine variable; this variable is now deprecated.
  • Support for multi-statement transactions that use distributed joins.
  • Support for multi-statement transactions that write to a database and then read from a different database.
  • Added a session variable node_degree_of_parallelism that specifies the number of threads per node to use for columnstore table scans.
  • Queries that use the IN filter on shard keys now perform better.
  • Hash joins that use reference tables now perform better.
  • The data_conversion_compatibility_level engine variable can now be set to 7.0. This allows errors to be thrown for more invalid type conversions, instead of doing implicit type conversions, where the user could see unexpected results.
  • Now, a maximum of 1024 tables can be written to in a transaction.

Query Optimization

  • Column statistics are now automatically gathered on rowstore tables. This is in addition to column statistics being automatically gathered on columnstore tables, which was done previously.
  • Range statistics (histograms) are now automatically gathered on rowstore and columnstore tables.
  • Rowstore table sampling has been improved.
  • Now, a true row-level random sample of rows from columnstore tables is maintained automatically and used to estimate the selectivity of complex predicates.
  • All shapes of bushy joins are now supported.
  • Common Table Expressions (CTEs) can now be materialized; redundant expressions can use cached data instead of being recomputed.

Data Storage, Loading, and Backup

  • Performance improvements for bulk inserts, large insert/selects, LOAD DATA, and pipelines to columnstore tables (particularly for wide columnstore tables).
  • Added support for publishing data to Kafka via SELECT … INTO KAFKA ….
  • Added support for Parquet in CREATE PIPELINE.
  • Improved error reporting for pipelines and LOAD DATA.
  • Incremental backup for columnstore data.

System Management

  • Added a column to information_schema.TABLE_STATISTICS to indicate if a table is rowstore or a columnstore.
  • Added columns to the MV_QUERY_ACTIVITIES and MV_QUERY_ACTIVITIES_EXTENDED_CUMULATIVE views.
  • Added the MV_GLOBAL_STATUS, MV_GLOBAL_VARIABLES, MV_SYSINFO_CPU, MV_SYSINFO_CPU_DISK, MV_SYSINFO_CPU_LIST, MV_SYSINFO_MEM, and MV_SYSINFO_NET views.

Functional Extensions

  • Cross-database views are now supported.
  • Added built-in support for time series reporting through the new FIRST, LAST, and TIME_BUCKET functions.
  • GROUP_CONCAT() now supports the ORDER BY clause.
  • Added the MEDIAN() aggregate function.

Maintenance Release Changelog

2020-12-07 Version 7.0.24

  • Fixed a crash caused by timestamp values during daylight savings time changes in columnstore tables using a timestamp as a sort key or unique key.
  • Fixed a crash that occurred when querying information_schema.COLUMNAR_SEGMENTS, information_schema.MV_COLUMSTORE_FILES, or information_schema.MV_COLUMNAR_SEGMENT_INDEX while ALTER TABLE … DROP INDEX … is being run concurrently in order to drop a secondary columnstore index.
  • Added support for audit logging levels other than ALL-RESULTS mode when clients connect using server side prepared statements. Previously, enabling audit logging disabled use of prepared statements.
  • Fixed a crash that occurred when ANY_VALUE(NULL) is included in a query.
  • Fixed a crash that occurred where a query-typed variable accesses a table via a 3-part name (for example, dbname.tablename.columnname) and the database referred to does not exist. Blocked the creation of pipelines which refer to computed columns or columns set via a default value in the WHERE clause.
  • Snapshots of the memsql system database are now taken more frequently. Previously, the memsql database was slower to recover on higher latency disk than user databases; more frequent snapshots speed up restart times.

2020-10-19 Version 7.0.23

  • The CHECK BLOB CHECKSUM command now supports a WITH REPAIR option that can repair file system corruptions as long as one copy of a corrupt columnstore blob file exists on the cluster. See more information in the CHECK BLOB CHECKSUM topic.
  • Fixed an issue that occurred when using server side prepared statements via the newer versions of the MariaDB client. The MySQL client is unaffected.
  • Fixed a wrong result issue when using IN with a columnstore secondary hash index, which caused only the first of multiple values with the same hash in a segment to be output.
  • Fixed wrong result when using GROUP_CONCAT with ORDER BY in query type variables.

2020-08-24 Version 7.0.22

  • Fixed an issue where REPLICATE DATABASE would fail in cases where no replicated database existed yet and the node containing the master aggregator was previously replaced with a new node.
  • Now, SHOW PROCESSLIST returns the IP address of the client (in the Host column) if the host name cannot be resolved.
  • Fixed an issue that occurred when the engine variable interpreter_mode was set to interpret_first, where the first run of the query remained in interpreted mode even after compilation had finished.
  • Added support for ALTER PIPELINE ... SET RESOURCE POOL.
  • Reduced the CPU usage required for many threads to allocate memory for variable length strings.
  • Added a new engine variable convert_nonunique_hash_to_skiplist which, when set to ON, creates a skiplist index in place of any non-unique hash index. By default, the variable is set to OFF.
  • Pipelines no longer stop executing if free disk space drops lower than 5 * <the value of the minimal_disk_space engine variable>. They now stop executing if free disk space drops lower than 400 mb + <the value of the minimal_disk_space engine variable>.
  • Improved the error message returned when REMOVE AGGREGATOR is run on the master aggregator.
  • The behavior of the max_compilation_time_s engine variable no longer applies to DDL queries. Eliminating compilation timeouts for these queries, especially for ALTER TABLE, will prevent the queries from potentially running forever.
  • Fixed a compatibility issue that occurred when running newer versions of SAP Business Object Data Services (BODS) against MemSQL. The issue was that the command SET sql_mode = <expression> was not supported by MemSQL.
  • Fixed an issue where auto-attach would fail to attach a partition but succeed in bringing the leaf online, resulting in offline partitions.
  • Ensured that any re-used existing replica partition is still forced in sync as part of the copy operation run as a step of a REBALANCE operation. Previously, this was a cause of failure in cases where REBALANCE failed once prior.
  • Fixed a failure when running a ROLLUP query over a constant expression.
  • Disallowed parameter names with . in them when defining a function or stored procedure.

2020-07-20 Version 7.0.21

  • Fixed an issue where joins with reference tables are sometimes executed using Cartesian joins, even when faster joins are possible.
  • Improved the performance of rebalance partition operations that copy partitions with many small columnstore BLOB files.
  • Fixed an issue where the engine variable explicit_defaults_for_timestamp could be set to a different value on new nodes added to the cluster, as compared to the existing value already set on existing nodes. This would cause the error “partition’s table metadata are out of sync” when queries were run.
  • ANALYZE TABLE now requires the SELECT and (INSERT or ALTER) permissions to execute.
  • Improved the performance of batch deletes on columnstores. The increase in performance depends on the encoding of the columns.
  • When preserve_original_colstore_json is set to ON, null values and empty arrays are preserved in JSON columns in columnstores.

2020-06-29 Version 7.0.20

  • Fixed an issue where user-defined tables having the same name as system tables, such as USERS, were treated as case-sensitive when table_name_case_sensitivity was set to OFF.
  • SSL ciphers that use elliptic curves are now supported.
  • Added an optimizer setting that allows you to adjust the cost of performing a cross join.
  • The STATE column in the information_schema.PIPELINES view now shows a pipeline in an error state when the pipeline stops due to insufficient disk space.
  • Fixed an issue where running STOP REPLICATING would sometimes cause replica partitions, after becoming primary partitions, to replicate asynchronously when the original primary database used synchronous replication.
  • Improved predicate transitivity rewrites to avoid adding redundant predicates. This fixes compile timeouts for some queries with large numbers of predicates.
  • Fixed an issue where filters were sometimes not being used as hash join conditions if they involved expressions. This was causing filters to be evaluated on more rows than needed.

2020-06-09 Version 7.0.19

  • Correctly generate an error when running ALTER VIEW on a schema-bound view v1, where a schema-bound view v2 refers to v1.
  • Fixed an auto-attach failure in specific failure conditions where a node quickly got marked offline and then online by the aggregator, but the node itself did not restart.
  • Improved the parallelism of heartbeat reconnects. This fixed erroneous node failures in cases of issues causing slow connections, when multiple nodes were unhealthy.
  • Fixed a crash where VALUES clauses in INSERT ... SELECT ... ON DUPLICATE KEY UPDATE referred to tables not being written to.

2020-05-26 Version 7.0.18

  • Added the engine variable json_extract_string_collation. When the variable is set to AUTO (the default setting) or JSON, the extracted string has the same collation as the JSON object that is being extracted from. When the variable is set to server, the extracted string has the same collation as the collation_server variable.
  • Reduced memory usage in certain outer join queries that use single table or constant filters.
  • Fixed an issue where views using set operations (such as INTERSECT and MINUS) could be parsed incorrectly when being queried.
  • Fixed an issue with INSERT .. SELECT queries that use generator functions (functions that are computed on the aggregator, with the results sent to the leaves).
  • Fixed an intermittent incorrect permission denied error that occurred when views were used in queries on secondary clusters.
  • Improved the performance of the query that is run on information_schema.OPTIMIZER_STATISTICS by MemSQL Ops schema monitoring.

2020-04-27 Version 7.0.16

  • Added the innodb_strict_mode engine variable. The django-mysql library uses the value of this variable to connect to MemSQL.
  • Improved the efficiency of locking done by the security manager to allow more concurrent login/logout operations.
  • Added an error message to return when a command fails due to a bug: “Operation <operation name> failed without specifying a detailed error message”.
  • Fixed an issue where the security checks done for temporary tables could result in an incorrect access denied error.
  • When external authentication to MemSQL (via PAM, Kerberos, etc.) is slow, track the time needed to authenticate and add logging to the tracelog. This is done for troubleshooting purposes.
  • Now, the RESTORE command can restore a 6.x backup to a later version, using a different database name than was specified when the database was backed up.
  • Fixed an issue where views could not be queried after being created, due to a parsing error.

2020-04-06 Version 7.0.15

  • Fixed an issue where synchronous replication incorrectly consumed all of the disk space on the host containing either the master or replica partitions. The issue occurred when the host containing the replica partitions had a specific type of network connectivity problem with the host containing the master partitions.
  • Fixed an issue that could cause an online upgrade to fail if a reference database needs to reprovision in the middle of the upgrade.
  • Fixed an issue where incorrect reuse of query plans stored on disk could cause the error “No function with id 23003 exists”.
  • You can specify that the SHOW VARIABLES command returns only cluster-wide variables.
  • Fixed a bug where dropping a secondary index on a columnstore table and then adding another secondary index on the table (via ALTER TABLE) causes the secondary indexes on the table to become corrupted after a node restart.
  • Fixed a bug where using ALTER TABLE to add a secondary index to a columnstore table, truncating the table, and then restarting the node would cause the added secondary index to be disabled and alters on the table to crash the node.
  • Added Google Cloud Storage (GCS) pipelines to the information_schema.PIPELINES view.
  • Added the histogram column to the information_schema.COLUMN_STATISTICS view.
  • Added the path column to the information_schema.PLANCACHE view. This column contains the path to the file where a query plan is stored on disk.

2020-03-23 Version 7.0.14

  • Avoid inaccurate sampling estimates for expressions that use CURRENT_SECURITY_ROLES().
  • Fixed a crash that occurred when PERCENTILE_DISC() and PERCENTILE_CONT() were used with a sub-select.
  • Added int96 support to Parquet pipelines.
  • When an EXPLAIN query runs, return a warning when the query is run on tables containing zero rows.
  • Fixed a crash that occurred during query rewrites for complex UPDATE and DELETE queries having subselects.
  • Fixed an error that occurred when a plan was dropped from the plancache during a distributed join.
  • Added clearer error messages when queries are unable to run due to workload manager or resource governor settings. Also, added better logging for these errors.
  • Queries that are built from TO_QUERY() and query type values are parameterized when they run. Parameterization of these queries allows plans to be reused in the plancache.
  • Added support for pipelines that extract data from Google Cloud Storage (GCS).
  • Fixed a locking issue that could cause dropping a temporary table to fail.

2020-03-09 Version 7.0.13

  • Now, queries inside of stored procedures that use variables run as single partition queries if the queries only need to access data within single partitions. Previously, these types of queries did not run as single partition queries.
  • Fixed an issue where CLEAR ORPHAN DATABASES did not handle table names that require escaping, such as table names prefixed with #.
  • Now, ANALYZE TABLE incorporates the behavior of the node_degree_of_parallelism engine variable.
  • Now, call the malloc_trim() function less aggressively. This function shrinks the memory cached by libc malloc.
  • Added the columns STORAGE_TYPE and DISTRIBUTED to information_schema.TABLES. These columns were previously returned when running SHOW TABLES, only.
  • Now, improve the performance of queries that use hash outer joins with some types of table filters.

2020-02-18 Version 7.0.12

  • Fixed an incorrect “Transaction rolled back mid-query” error that could occur during a multi-statement transaction if the transaction was started on at least 64 partitions.
  • Fixed an error that occurred when SELECT ... GROUP BY ... ORDER BY RAND() was run on a columnstore table.
  • Now, increase the number of columns that SELECT queries can return.
  • Removed a limitation on the number of columns in a table allowed for columnstore sampling. Increased the number of keys allowed for optimized columnstore JSON storage in some cases.
  • Fixed a deadlock that is possible when the RESTORE DATABASE command fails and attempts to remove the restored database partitions that were created before the command failed.
  • Now, cluster-wide operations that take a global lock, such as ADD LEAF and REBALANCE, update the STATE column of information_schema.PROCESSLIST. The update indicates that a command is blocked on the lock.

2020-01-27 Version 7.0.11

  • Added the columnstore_sample_per_partition_limit engine variable. This variable controls the maximum number of rows sampled per partition for columnstores.
  • Now, stop examining samples of columnstore tables that have many columns. Continue collecting samples for these types of tables.
  • Fixed an issue with kerberos-related CONFIG clause keys in CREATE PIPELINE ... being considered invalid by HDFS pipelines; added the allow_unknown_configs option to such pipelines to skip CONFIG clause validity checks.
  • Now, periodically shrink the memory cached by libc malloc by calling malloc_trim().
  • Now, the table_name_case_sensitivity engine variable can be used to change whether MemSQL treats tables, views, and table aliases as case-sensitive or case-insensitive. This variable can only be changed if no user databases have been created on the cluster.
  • Fixed a crash that occurred when running UPDATE ... SET ... which sets a sparse VARCHAR column that had been previously assigned to earlier in the same UPDATE ... SET ... query. Fixed the same issue that occurred when setting a sparse DECIMAL column.
  • Fixed an issue where cross-database distributed joins could fail to cleanup the temporary result tables created to run the join.

2020-01-13 Version 7.0.10

  • Now, the REPLICATE DATABASE command requires a primary database and its secondary database to have the same durability setting (SYNC or ASYNC).
  • Fixed an issue that caused STOP REPLICATING to encounter an error after a database was upgraded from MemSQL 6.x to MemSQL 7.0.
  • Now, LOAD DATA incorporates the behavior of the ignore_insert_into_computed_column engine variable.
  • Fixed a crash that sometimes occurred when calling the MAX function on an ENUM column.
  • Now, HDFS Pipelines skip files in the data source that are under the _temporary directory.
  • Added the clause NULL DEFINED BY ... OPTIONALLY ENCLOSED to the syntax of LOAD DATA .... When this clause is used, LOAD DATA counts NULL values enclosed with quotes as NULLs.
  • Now, snapshot reference databases after ALTER TABLE is run. Also, snapshot all databases after TRUNCATE is run.
  • Fixed a memory leak in queries that use full-text indexes.
  • Fixed an issue with database replication that occurred when the snapshots_to_keep engine variable is set to 1.
  • Now, delete the existing samples associated with a table when columnstore sampling on the table is disabled. Also, reset the sampling flag on all segments associated with the table so they will be resampled if columnstore sampling is re-enabled.

2019-12-10 Version 7.0.9

  • Initial GA release of MemSQL 7.0