Outdated Version

You are viewing an older version of this section. View current production version.

Choosing a Table Storage Type

The following table enumerates the strengths and intended uses of the rowstore and columnstore.

In-Memory Rowstore Flash, SSD, or Disk-based Columnstore
Operational/transactional workloads Analytical workloads
Fast inserts and updates over a small or large number of rows Fast inserts over a small or large number of rows
Random seek performance Fast aggregations and table scans
Updates/deletes are frequent Updates/deletes are rare
Compression

Conceptual Differences Between Row and ColumnStores

Row-oriented stores, or “rowstores” are the most common type of data stores used by relational databases. As the name suggests, a rowstore treats each row as a unit and stores all fields for a given row together in the same physical location. This makes rowstores great for transactional workloads, where the database frequently selects, inserts, updates, and deletes individual rows, often referencing either most or all columns.

Column-oriented stores, or “columnstores” treat each column as a unit and stores segments of data for each column together in the same physical location. This enables two important capabilities. One is to scan each column individually, in essence, being able to scan only the columns needed for the query, with good cache locality during the scan. The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together. A simplified example is shown here:

Logical table and rowstore representation

ProductId Color Price
1 Red 10
2 Red 20
3 Black 20
4 White 20

Columnstore

ProductId
1
2
3
4
Color
Red x 2
Black
White

“x 2” indicates that the value is repeated twice.

Price
10
20 x 3

“x 3” indicates that the value is repeated three times.

By physically organizing data by columns, the database is able to scan and aggregate data for individual columns very quickly, simply because less data needs to be scanned. This makes columnstores well-suited to analytical workloads. The trade-off of using the columnstore is that it does not lend itself well to manipulating individual rows quickly. Columnstores are usually batch-loaded for optimal performance, although small-batch and streaming insert to columnstores can also perform well. The combination of efficient batch loading and processing of analytic queries make columnstore highly suitable for data warehousing scenarios.

Related Topics