Configuring segment size in columnstore tables
In SingleStore DB, data in a columnstore table are organized into multiple row segments. For certain workloads, the size of row segments significantly affects performance. Considerations include:
- In general, larger segments compress better than smaller ones. This leads to less disk usage and faster scan across the table.
- On the other hand, smaller segments could benefit more from segment elimination. As a result, queries with highly selective filters on the columnstore index run faster with smaller segments.
In SingleStore DB, the default size of the row segments is controlled by the global variable columnstore_segment_rows
. By default, columnstore_segment_rows
has a value of 1024000
, meaning that each segment contains 1024000
rows by default.
-
The variable
columnar_segment_rows
from previous MemSQL versions was deprecated in favor ofcolumnstore_segment_rows
.columnar_segment_rows
exists as an alias tocolumnstore_segment_rows
. -
A cluster upgraded from a version before 6.0 will have the previous default value of
102400
forcolumnstore_segment_rows
. See Upgrading MemSQL to 6.0 for more information.
In addition to the global variable, it is possible to overwrite the global setting and set the segment size for a specific columnstore table. This can be done during the table creation, or by altering an existing table. For example:
CREATE TABLE t (id INT, KEY (id) USING CLUSTERED COLUMNSTORE WITH (columnstore_segment_rows=100000));
SHOW CREATE TABLE t;
****
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t | CREATE TABLE `t` (
`id` int(11) DEFAULT NULL,
KEY `id` (`id`) /*!90619 USING CLUSTERED COLUMNSTORE */ /*!90621 WITH(COLUMNSTORE_SEGMENT_ROWS=100000) */
) |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
Now, modify the value of columnstore_segment_rows
for this columnstore. The MODIFY KEY keyName
clause identifies
the index whose settings are to be changed (in this case, the columnstore index).
ALTER TABLE t MODIFY KEY id SET (columnstore_segment_rows=20000);
SHOW CREATE TABLE t;
****
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t | CREATE TABLE `t` (
`id` int(11) DEFAULT NULL,
KEY `id` (`id`) /*!90619 USING CLUSTERED COLUMNSTORE */ /*!90621 WITH(COLUMNSTORE_SEGMENT_ROWS=20000) */
) |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
Configuring the rowstore-backed segment size in columnstore tables
In addition to the on-disk segments, each SingleStore DB columnstore also has an in-memory rowstore-backed segment. Small-batch inserts write to the rowstore-backed segment first, while the background flusher
process periodically compresses those recently inserted rows and creates on-disk segments.
The size of the rowstore-backed segment is controlled by the global variable columnstore_flush_bytes
. The background flusher
process starts to create on-disk segments when the amount of data in the rowstore-backed segment exceeds columnstore_flush_bytes
, with a default of 32 MB
. Additionally, insert and load operations are considered small-batch if they write to each partition less than columnstore_flush_bytes * columnstore_disk_insert_threshold
, which is by default 16 MB
at a time.
The columnstore_disk_insert_threshold
value is a fractional value with a default of 0.5
. It controls the minimum size of a disk-backed row segment created by insert and load operations. Note that if OPTIMIZE TABLE FLUSH
is manually run, the minimum segment size can be much smaller.
Considerations about tuning the rowstore-backed segment size include:
- The rowstore-backed segment is stored in-memory. Therefore, the table consumes less memory when the rowstore-backed segment is smaller.
- The
background flusher
process can write more rows to disk at once if the rowstore-backed segment is larger, reducing the number of disk write operations during data ingestion.
Similar to columnstore_segment_rows
, the columnstore_flush_bytes
value can also be configured per-table with the following syntax:
CREATE TABLE t2 (id INT, KEY (id) USING CLUSTERED COLUMNSTORE WITH (columnstore_flush_bytes=4194304, columnstore_segment_rows=100000));
ALTER TABLE t2 MODIFY KEY id SET (columnstore_flush_bytes=8388608);