Outdated Version

You are viewing an older version of this section. View current production version.

Writing Columnstore Data

Unlike other columnstore implementations, SingleStore DB supports very fast, small-batch writes (such as single row inserts) directly into columnstore tables. This is implemented by storing newly written rows in a row-oriented skiplist before flushing them to the column-oriented format. This is all handled automatically behind the scenes, and rows are visible to reads as soon as they are committed to the row-oriented skiplist.

When using SingleStore DB columnstore indexes, the following are changes that can occur against the data store.

  1. Insert - Inserts into a columnstore index will either go into the rowstore-backed or create a new columnstore-backed row segment. This is automatically determined by the engine based on heuristics involving the size of the insert and the current state of the columnstore index. If an insert is large enough that it would write directly to the columnstore, it is an optimized insert. An optimized insert will load a set of data and pre-sort it in the order of the columnstore key column(s); this will cause the load to always create a single new row segment. By default, INSERT or LOAD DATA queries writing more than 16 MB per partition will produce an optimized insert.
  2. Delete - Deleting a row in a columnstore index causes the row to be marked as deleted in the segment metadata, leaving the data in place within the row segment. Segments which only contain deleted rows are removed, and the optimization process covered below will compact segments that require optimization.
  3. Update - An update in a columnstore index is internally performed as a delete followed by an insert within a transaction. As with inserts, small enough updates will write to the rowstore-backed row segment.
  4. Optimization - SingleStore DB has optimization routines that run automatically and can also be started manually. These routines attempt to automatically merge row segments together in order to improve query efficiency. DML statements can be performed while columnstore optimizations take place. For more information, see the OPTIMIZE TABLE topic.