You are viewing an older version of this section. View current production version.
Data Types
SingleStore DB supports the following data types. The sizes given are for in-memory rowstore tables. Columnstore (on-disk) tables are stored differently, and often compressed.
Integer Numbers
Data Type | Size | Size (Not Null) | Synonyms | Min Value | Max Value |
---|---|---|---|---|---|
BOOL* (see note below) | 2 bytes | 1 byte | BOOLEAN | -128 | 127 |
TINYINT | 2 bytes | 1 byte | -128 | 127 | |
SMALLINT | 4 bytes | 2 bytes | -32768 | 32767 | |
MEDIUMINT | 4 bytes | 3 bytes | -8388608 | 8388607 | |
INT | 8 bytes | 4 bytes | INTEGER | -2147483648 | 2147483647 |
BIGINT | 12 bytes | 8 bytes | -2 ** 63 | (2 ** 63) - 1 |
Remarks
BOOL
and BOOLEAN
are synonymous with TINYINT
. A value of 0
is considered FALSE
, non-zero values are considered TRUE
.
The format: INT(x)
(for example, INT(5)
) is used to specify display width and not the size of the integer. Display width is not directly used within SingleStore DB but may be used by some clients.
Each integer type can be “unsigned”, doubling the positive range of values and disallowing values under 0.
Real Numbers
Data Type | Size | Size (Not Null) | Synonyms | Precision |
---|---|---|---|---|
FLOAT | 4 bytes | 4 bytes | 23 bits | |
DOUBLE | 8 bytes | 8 bytes | REAL | 53 bits |
DECIMAL | see note | see note | DEC, FIXED, NUMERIC | 65 digits |
The space consumed by the DECIMAL
datatype varies with the precision and scale arguments. The syntax for defining a DECIMAL
(or its synonyms DEC
, FIXED
, and NUMERIC
) is in the form DECIMAL(P, S)
where P is the total number of digits (precision) and S (scale) is how many of those digits appear after the decimal point. For example, DECIMAL(10,5)
will store numbers with 5 digits before the decimal and 5 after.
The formula for the number of bytes used to store a decimal value is (ceil((P - S) / 9) + ceil(S / 9)) * 4
. So, for our example, it would be (ceil((10 - 5) / 9) + ceil(5 / 9)) * 4
, or 8 bytes.
For FLOAT
and DOUBLE
, precision refers to the number of bits in the significand.
Working with Floats and Doubles
FLOAT
and DOUBLE
represent approximate values and SingleStore DB follows IEEE 754 standards (see FLOAT
| see DOUBLE
) for floating point numbers.
The precision listed in the Real Numbers table refers to the number of bits used to store the significand. For FLOAT
, 23 bits is about 6 digits. For larger numbers (up to 53 bits) use DOUBLE
, which stores up to 14 digits.
Numbers exceeding the precision of the data type are rounded according to the IEEE 754 standard when entered. For example:
CREATE TABLE t (id float);
INSERT INTO t VALUES (123456);
INSERT INTO t VALUES (1234567);
SELECT * FROM t;
****
+---------+
| id |
+---------+
| 123456 |
| 1234570 |
+---------+
Time and Date
Data Type | Size | Size (Not Null) | Resolution | Min Value | Max Value |
---|---|---|---|---|---|
DATE | 8 bytes | 4 bytes | day | 1000-01-01 | 9999-12-31 |
TIME | 8 bytes | 4 bytes | second | -838:59:59 | 838:59:59 |
DATETIME | 12 bytes | 8 bytes | second | 1000-01-01 00:00:00 | 9999-12-31 23:59:59 |
DATETIME(6) | 12 bytes | 8 bytes | microsecond | 1000-01-01 00:00:00.000000 | 9999-12-31 23:59:59.999999 |
TIMESTAMP | 4 bytes | 4 bytes | second | 1970-01-01 00:00:01 | 2038-01-19 03:14:07 |
TIMESTAMP(6) | 8 bytes | 8 bytes | microsecond | 1970-01-01 00:00:01.000000 | 2038-01-19 03:14:07.999999 |
YEAR | 2 bytes | 1 byte | year | 1901 | 2155 |
Note: It is recommended to use DATETIME
or DATETIME(6)
column types instead of TIMESTAMP
or TIMESTAMP(6)
because TIMESTAMP
types only
support dates through 2038-01-19
as opposed to 9999-12-31
for DATETIME
types. This can avoid the need to do potentially time-consuming
application and database maintenance and conversion in the future.
Inserting or Updating a Column With the Current Timestamp
To allow inserts or updates of a TIMESTAMP
, TIMESTAMP(6)
, DATETIME
, or a DATETIME(6)
column with the current timestamp, define the column with the clause:
-
DEFAULT <timefunction>
, to insert a column with the current timestamp. -
ON UPDATE <timefunction>
, to update a column with the current timestamp. -
DEFAULT <timefunction> ON UPDATE <timefunction>
, to insert and update a column with the current timestamp.
where <timefunction>
is one of the following values: NOW()
, NOW(6)
, CURRENT_TIMESTAMP()
, CURRENT_TIMESTAMP(6)
.
Examples
The following example demonstrates the use of DEFAULT CURRENT_TIMESTAMP()
in column definition:
CREATE TABLE s(id INT, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP());
INSERT INTO s(id) VALUES(1);
INSERT INTO s(id) VALUES(2);
SELECT * FROM s;
****
+------+---------------------+
| id | ts |
+------+---------------------+
| 1 | 2020-05-21 09:12:25 |
| 2 | 2020-05-21 09:12:25 |
+------+---------------------+
The following example demonstrates the use of ON UPDATE CURRENT_TIMESTAMP(6)
in column definition:
CREATE TABLE s1(id INT, ts DATETIME(6) ON UPDATE CURRENT_TIMESTAMP(6));
INSERT INTO s1(id,ts) VALUES(1,"2017-07-01");
SELECT * FROM s1;
****
+------+----------------------------+
| id | ts |
+------+----------------------------+
| 1 | 2017-07-01 00:00:00.000000 |
+------+----------------------------+
UPDATE s1 SET id = 2 WHERE id = 1;
SELECT * FROM s1;
****
+------+----------------------------+
| id | ts |
+------+----------------------------+
| 2 | 2020-05-21 09:36:47.124979 |
+------+----------------------------+
The following example demonstrates the use of DEFAULT NOW(6) ON UPDATE NOW(6)
in column definition:
CREATE TABLE s2(id INT, ts DATETIME(6) DEFAULT NOW(6) ON UPDATE NOW(6));
INSERT INTO s2(id) VALUES(1);
INSERT INTO s2(id) VALUES(2);
SELECT * FROM s2;
****
+------+----------------------------+
| id | ts |
+------+----------------------------+
| 2 | 2020-05-21 11:21:30.998275 |
| 1 | 2020-05-21 11:21:30.992868 |
+------+----------------------------+
UPDATE s2 SET id = 2 WHERE id = 1;
SELECT * FROM s2;
****
+------+----------------------------+
| id | ts |
+------+----------------------------+
| 2 | 2020-05-21 11:21:30.998275 |
| 2 | 2020-05-21 11:23:39.134793 |
+------+----------------------------+
Prior to MemSQL/SingleStore 7, when you inserted or updated a record, the first TIMESTAMP
or TIMESTAMP(6)
column would automatically be populated with the current timestamp. Such columns did not use DEFAULT CURRENT_TIMESTAMP()
or ON UPDATE CURRENT_TIMESTAMP()
clause in their column definitions. In MemSQL/SingleStore 7 and later, this behavior has been deprecated; TIMESTAMP
or TIMESTAMP(6)
columns without these clauses will be populated with a NULL
value instead of the current timestamp. For instance, NULL
values are populated in the ts
column in the following example, when run on MemSQL/SingleStore 7 or later.
CREATE TABLE r(d DATETIME(6), ts TIMESTAMP(6), id INT);
INSERT r(d, id) VALUES("2017-07-01", 1);
SELECT * FROM r;
****
+----------------------------+----------------------------+------+
| d | ts | id |
+----------------------------+----------------------------+------+
| 2017-07-01 00:00:00.000000 | NULL | 1 |
+----------------------------+----------------------------+------+
UPDATE r SET id = 2 WHERE id = 1;
SELECT * FROM r;
****
+----------------------------+----------------------------+------+
| d | ts | id |
+----------------------------+----------------------------+------+
| 2017-07-01 00:00:00.000000 | NULL | 2 |
+----------------------------+----------------------------+------+
The explicit_defaults_for_timestamp Variable
The variable explicit_defaults_for_timestamp
determines if certain nonstandard behaviors for default values and NULL-value are disabled for the TIMESTAMP
column. By default, the variable is enabled, which disables the nonstandard behaviors.
If the explicit_defaults_for_timestamp
variable is enabled, SingleStore DB handles the TIMESTAMP
column behavior as follows:
-
Assigning a
TIMESTAMP
column value ofNULL
does not set it to the current timestamp. To set to the current timestamp, set theTIMESTAMP
column toCURRENT_TIMESTAMP()
orNOW()
. -
If the
TIMESTAMP
column is not declared asNOT NULL
, then it is automatically declared with theNULL
attribute and permitsNULL
values. Assigning a value ofNULL
to the column sets it toNULL
, not the current timestamp. -
TIMESTAMP
column declared with theNOT NULL
attribute do not permitNULL
values. -
TIMESTAMP
columns explicitly declared with theNOT NULL
attribute and without an explicitDEFAULT
attribute are treated as having no default value. -
To declare
TIMESTAMP
column with theDEFAULT CURRENT_TIMESTAMP()
orON UPDATE CURRENT_TIMESTAMP()
attributes, these must be explicitly specified. -
The first
TIMESTAMP
column in a table is handled in the same way as theTIMESTAMP
columns following the first one.
If the explicit_defaults_for_timestamp
variable is disabled, SingleStore DB handles the TIMESTAMP
column behavior as follows:
-
If the
TIMESTAMP
column is not declared asNULL
, then it is automatically declared with theNOT NULL
attribute. Assigning a value ofNULL
to the column is permitted and sets the column to the current timestamp. -
The first
TIMESTAMP
column in a table, if not declared asNULL
,DEFAULT CURRENT_TIMESTAMP()
orON UPDATE CURRENT_TIMESTAMP()
, is automatically declared with theDEFAULT CURRENT_TIMESTAMP()
andON UPDATE CURRENT_TIMESTAMP()
attributes. -
TIMESTAMP
columns following the first one, if not declared with theNULL
attribute or aDEFAULT
attribute, are automatically declared asDEFAULT
‘0000-00-00 00:00:00’ (the “zero” timestamp).
String Types
Fixed-Length String
There are three fixed-size string types in SingleStore DB. The space consumed is directly proportional to their length definitions. Values of type CHAR
require 3 x LENGTH
bytes of memory, as SingleStore DB supports only the UTF8 character set. Values of type BINARY
require LENGTH
bytes of memory. char(0) not null
and binary(0) not null
still allocate 1 byte.
Data Type | Size | Size (Not Null) |
---|---|---|
CHAR(LENGTH) | 3*LENGTH + 1 bytes | 3*LENGTH bytes |
BINARY(LENGTH) | LENGTH + 1 bytes | LENGTH bytes |
A column declared as CHAR()
stores values as per the defined length. When CHAR
values are stored, they are right-padded with spaces to the specified length but when the CHAR
values are retrieved, trailing spaces are removed.
CREATE TABLE t1 (c1 CHAR(3));
INSERT INTO t1 VALUES (' ');
INSERT INTO t1 VALUES ('ab ');
INSERT INTO t1 VALUES ('abcd');
SELECT c1, LENGTH (c1) FROM t1;
****
+------+-------------+
| c1 | LENGTH (c1) |
+------+-------------+
| abc | 3 |
| ab | 2 |
| | 0 |
+------+-------------+
Variable-Length Strings
There are several variable-length string types. The data for these variable types is stored in a memory allocation designed for them, and pointers to its locations in memory are stored in the main table structure. This means that the space consumed by a particular value is the byte length of the string, plus an 8-byte overhead for the pointer, and a 4-byte overhead for storing the length. (There is no additional 4-byte overhead if the field is nullable.) One exception is values less than 7 bytes long. Those small values are stored directly in the pointer area.
The three main types are VARCHAR
, VARBINARY
, and LONGTEXT
. While they all operate similarly for most English-language alphanumeric strings, there are several important differences in their behavior. The length limit of the VARCHAR
is measured in Unicode characters, not in bytes. VARBINARY
fields store data verbatim, regardless of whether they are valid Unicode strings. Sorting and comparing a VARBINARY
uses the exact binary representation.
To reduce fragmentation, allocations for variable sized data use sizes from a fixed list in multiples of eight (16, 32, 40, etc.). Variable data of other lengths is rounded up to the next nearest, so a ten (10) byte string would allocate sixteen (16) bytes.
Data Type | Max Length |
---|---|
VARCHAR | 21,845 characters |
VARBINARY | system RAM |
LONGTEXT | 4 GB |
There are several variants of the BLOB and TEXT types, listed below. Aside from their maximum lengths, there is no practical difference between these types in SingleStore DB, including performance or memory efficiency. They are implemented solely for backwards compatibility with MySQL syntax.
Name | Max Length | Description |
---|---|---|
LONGBLOB | 4 GB | VARBINARY |
MEDIUMBLOB | 16 MB | VARBINARY |
BLOB | 65,535 bytes | VARBINARY |
TINYBLOB | 255 bytes | VARBINARY |
MEDIUMTEXT | 16 MB | LONGTEXT |
TEXT | 65,535 bytes | LONGTEXT |
TINYTEXT | 255 bytes | LONGTEXT |
Comparison operators treat trailing spaces differently when used with variable length TEXT
type columns such as VARCHAR()
, TEXT()
, MEDIUMTEXT()
, and TINYTEXT()
. This behavior does not apply to BLOB
type columns. Comparison operators ignore the trailing spaces at the end of the string being compared, except for the LIKE
operator. The LIKE
operator does not ignore the trailing spaces and considers them while comparing the strings.
The following example demonstrates how comparison operators treat trailing spaces for a VARCHAR()
column. Consider a table Employee
as follows:
DESCRIBE Employee;
****
+----------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+------+---------+-------+
| Location | varchar(10) | YES | | NULL | |
+----------+-------------+------+------+---------+-------+
Insert a few values in the Location
column, with and without trailing spaces:
INSERT INTO Employee VALUES ('BERLIN '),('BERLIN '),('BERLIN'),('BERLIN ');
SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM Employee;
****
+------------------+----------------------------+
| LENGTH(Location) | CONCAT('[', Location, ']') |
+------------------+----------------------------+
| 6 | [Berlin] |
| 8 | [Berlin ] |
| 7 | [Berlin ] |
| 7 | [Berlin ] |
+------------------+----------------------------+
The following SELECT
statements ignore the trailing spaces and return the same output:
SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM Employee WHERE Location = 'Berlin ';
SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM Employee WHERE Location = 'Berlin';
SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM Employee WHERE Location = 'Berlin ';
****
+------------------+----------------------------+
| LENGTH(Location) | CONCAT('[', Location, ']') |
+------------------+----------------------------+
| 6 | [Berlin] |
| 8 | [Berlin ] |
| 7 | [Berlin ] |
| 7 | [Berlin ] |
+------------------+----------------------------+
An exception is the LIKE
comparison operator, where trailing spaces are significant while comparison.
SELECT LENGTH(Location), CONCAT('[', Location, ']') FROM Employee WHERE Location LIKE 'Berlin';
****
+------------------+----------------------------+
| LENGTH(Location) | CONCAT('[', Location, ']') |
+------------------+----------------------------+
| 6 | [Berlin] |
+------------------+----------------------------+
JSON Type
The JSON type can contain any valid Javascript Object Notation data, including maps, arrays, and nested types. For more information see the JSON Guide .
Data Type | Size | Size (Not Null) |
---|---|---|
JSON | 20 bytes + data size | 16 bytes + data size |
Geospatial Types
Geospatial types contain information about points, paths, and polygons on the surface of the Earth. For more information, see the Geospatial Guide.
Data Type | Size | Size (Not Null) |
---|---|---|
GEOGRAPHYPOINT | 8 bytes | 8 bytes |
GEOGRAPHY | See note |
The space consumed by the path and polygon data in Geography
columns is dependent on the complexity of the objects. Very roughly, a Geography
value consumes 24 bytes per vertex, plus a 64 byte overhead, plus some minor overhead for each “ring” in a multipolygon.
Indexing a Geography
object may also consume more space than indexes over other datatypes. The rule of thumb is 40 bytes * the value of the RESOLUTION parameter of the index, per row.
SingleStore DB (MemSQL) Procedural SQL (MPSQL) Types
Two data types are only supported in SingleStore DB (MemSQL) Procedural SQL (MPSQL)
scalar-valued functions and stored procedures: ARRAY
and RECORD
.
The QUERY
data type is only supported in stored procedures.
For more information about these types, see the
ARRAY,
RECORD, and
QUERY topics.
Other Types
Data Type | Size | Size (Not Null) | Max cardinality | Max values per row |
---|---|---|---|---|
ENUM | 4 bytes | 2 bytes | 65,535 unique values | 1 |
SET | 12 bytes | 8 bytes | 64 unique values | 64 |
BIT | 9 bytes | 8 bytes |
Using the ENUM Data Type
An ENUM
is a string value chosen from a user-defined list of values that are specified when creating a table. The values in an ENUM
list must be quoted strings, which are distinct from others in the list. A column can be defined as ENUM
using the following syntax:
CREATE TABLE table_name (
---
col_name ENUM('val1','val2',...,'valN')
)
The values in an ENUM
list are indexed, starting from 1
. Consider a column defined as mode ENUM('Primary','Secondary','Tertiary')
. The following table displays the index for each possible value that the column can have.
Value | Index |
---|---|
NULL |
NULL |
'’ | 0 |
‘Primary’ | 1 |
‘Secondary’ | 2 |
‘Tertiary’ | 3 |
The index of a NULL
value is NULL
. An empty string denotes an invalid value, and it has an index of 0
. You can use the following query to identify the rows with invalid ENUM
value assignments:
SELECT ...
WHERE enum_column = 0;
The index of any ENUM
value denotes its position in the list of values. It has no relation with table indexes.
ENUM
values cannot be expressions or user-defined variables. In non-strict mode, an invalid value is inserted as an empty string. In strict mode, trying to insert an invalid values returns an error. ENUM
values are sorted based on their index values. The NULL
values are sorted first, then the empty strings (with index 0
), followed by the values in the ENUM
list.
The following example shows how ENUM
values are defined and used.
CREATE TABLE delStat
( ID VARCHAR(3),
Status ENUM('NOT_DISPATCHED','IN_TRANSIT','DELIVERED'));
INSERT INTO delStat VALUES('xdy',1);
INSERT INTO delStat VALUES('tsg','IN_TRANSIT');
INSERT INTO delStat(ID) VALUES('qfc');
SELECT * FROM delStat ORDER BY Status ASC;
****
+------+----------------+
| ID | Status |
+------+----------------+
| qfc | NULL |
| xdy | NOT_DISPATCHED |
| tsg | IN_TRANSIT |
+------+----------------+
This example shows that if an integer values is specified, the ENUM
value at that index is inserted. Because the NOT NULL
constraint was not applied on the column, the default value inserted is NULL
.
Using the SET Data Type
The SET data type restricts the values that can be inserted for a table column, and can have zero or more string values. However, the string value must exist in the list of values specified for the column at the time of table creation.
A SET column value can contain multiple set members, separated by commas. In such a case,
- the SET member values should not individually contain commas.
- the SET member values can be inserted in any order. The results are retrieved in the same order as listed at the time of table creation.
- even if a SET member value is repeated, the results display the value only once.
The following example displays the usage of the SET data type:
Create a table column with SET data type.
CREATE TABLE setexample (col SET('v1', 'v2', 'v3', 'v4'));
Insert column values as ‘v1,v2’, ‘v2,v1’, ‘v1,v4,v4’, and ‘v3,v4,v3’.
INSERT INTO setexample (col) VALUES ('v1,v2'), ('v2,v1'), ('v1,v4,v4'), ('v3,v4,v3');
The result is retrieved in the same order in which the SET member values were listed during table creation and duplicate values are ignored:
SELECT col FROM setexample;
****
+------------------------------+
| col |
+------------------------------+
| v1,v2 |
| v1,v2 |
| v1,v4 |
| v3,v4 |
+------------------------------+
Using the BIT Data Type
The BIT
data type is used to represent bit vectors. It stores bit values in binary notation, i.e., a binary value represented using 0
s and 1
s. A value of type BIT(n)
can store n
-bit values, where n
can range from 1
to 64
. However, SingleStore gives a warning if the size of BIT
type column is not 64
.
Bit-Value Literals
The BIT
data type can also be used to store and represent bit-value literals. By default, a bit-value literal is stored as a binary string. In numeric expressions, a bit-value literal is treated as an integer equivalent of the BIT
value. A bit-value literal can be specified using any of the following notations:
b'101'
orB'101'
- the leadingb
is case-insensitive0b101
- the leading0b
is case-sensitive
If you assign a value that is less than n
bits long, the value is left-padded with 0
s. For example, if you assign the value b'1100'
to a BIT(7)
column type, the assigned value is b'0001100'
.
Examples
The following example shows that bit-value literals are stored as binary strings.
SELECT b'1100101' AS 'String Value', CHARSET(b'1100101') AS 'Charset';
****
+--------------+---------+
| String Value | Charset |
+--------------+---------+
| e | binary |
+--------------+---------+
The following example shows how bit-value literals behave as integers in a numerical expression.
SELECT b'1100101' + 9;
****
+----------------+
| b'1100101' + 9 |
+----------------+
| 110 |
+----------------+
Comparing mismatched datatypes
Using comparisons between mismatched datatypes may degrade query performance and may use unsafe type conversions which can yield undesirable query results. SingleStore DB will display a warning for queries with potentially problematic comparisons between mismatched datatypes in the EXPLAIN
and information_schema
records for the query.
These warnings do not necessarily indicate a problem, and you may have valid reasons for comparing different datatypes. But these type mismatches are flagged to help you identify potential problems that you may not be aware of.
When you see these warnings, you should check whether the datatypes are expected to be different. You may wish to consider changing the datatypes of the fields or values involved. You may also wish to consider adding explicit type conversion operations, such as the cast operators or functions like STR_TO_DATE which convert between different types.
These warnings can be seen in EXPLAIN <query>
, information_schema.plancache.plan_warnings
, information_schema.mv_queries.plan_warnings
, and the PlanWarnings
column of show plancache
in textual form, as well as in EXPLAIN JSON <query>
, information_schema.plancache.plan_info
, and information_schema.mv_queries.plan_info
in JSON form.
Example
An example of these warnings is the following:
CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));
EXPLAIN SELECT * FROM t WHERE id = 123;
****
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.singlestore.com/docs/mismatched-datatypes for more information. |
| |
| WARNING: Comparison between mismatched datatypes: (`t`.`id` = 123). Types 'varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL' vs 'bigint(20) NOT NULL'. |
| |
| Gather partitions:all alias:remote_0 |
| Project [t.id] |
| Filter [t.id = 123] |
| TableScan db.t, PRIMARY KEY (id) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In this example, the query involves the expression t.i = 123
, where t.i
is a string field. When a numeric value is compared to a string, the string is converted to a numeric value, e.g. the string '123'
is converted to the number 123
.
This comparison may be problematic for multiple reasons:
-
The mismatched datatypes may indicate a mistake in how the query is written or how the table is defined. The query may behave differently than intended and yield undesired results. For example, all of the strings
'123'
,'123.0'
,'0123'
, and'123abc'
compare equal to the integer123
, so while the query may be intended to retrieve a single row with the specifiedid
, this equality comparison may actually yield multiple rows whoseid
values all compare equal to123
. -
The mismatched datatypes negatively impact performance of the query. If the field and constant were either both strings or both integers, the query plan would be able to use the index to efficiently seek to lookup the matching
id
. But because there are actually many possible stringid
s that match the number123
, which do not come in any particular order in terms of string lexicographic order, the query cannot seek into the index and instead must scan all the rows of the table.
data_conversion_compatibility_level
The global sync variable data_conversion_compatibility_level
controls the way certain data conversions are performed. This variable can have the following possible values: '6.0'
, '6.5'
, '7.0'
. Higher values introduce stricter checking of values, and will error for conversions that worked at lower levels. E.g., the '7.0'
level will fail the conversion of 'abc'
to an integer value, while the '6.0'
level will silently convert it to a NULL or 0. It’s recommended to set this variable to the highest available value for new application development.
For existing applications, it’s also recommended to use the highest available level, but it’s recommended that you test your application before deploying this change. In particular, changing the value of data_conversion_compatibility_level
can change the behavior of expressions in computed columns.
If a computed column value changes due to a change in data_conversion_compatibility_level
, columnstore sorting, indexes, and sharding can become logically corrupted. SingleStore DB does not recompile an existing plan when data_conversion_compatibility_level
or sql_mode
changes.
sql_mode
is persisted to the CREATE TABLE
statement. Therefore, the computed column of a table always uses the sql_mode
that the table is created with, which may be different from the current sql_mode
.
Best practices to avoid data corruption
- Review the computed column expressions when changing
data_conversion_compatibility_level
. - Change
data_conversion_compatibility_level
only as a part of an application upgrade process. - Perform application upgrade tests.
For example, if a computed column contains corrupted data and you have to switch to a higher data_conversion_compatibility_level
, you may need to drop the computed column before changing the level. Once the level is increased, add the computed column to the table. Alternatively, if a persisted computed column is used in a columnstore key, you may have to create a new table and select data into this new table. After copying the data, drop the old table and rename the new table.
The following examples demonstrate how the behavior of expressions may change if data_conversion_compatibility_level
is changed.
Example 1
SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
****
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 20190422 |
+------------------------+
Example 2
SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
****
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22 |
+------------------------+