Outdated Version

You are viewing an older version of this section. View current production version.

Management View Reference

This topic contains reference information about each workload profiling management view.

Concepts and Terminology

Before querying these management views, it’s important to understand the concepts behind the terminology used in each view’s schema.

Task

MemSQL is instrumented to divide its execution time into tasks. Workload profiling management views in the information_schema database report the resource usage of sets of tasks. The views differ mainly in that they report on different sets over different intervals of time.

Task Examples

  • A single query’s execution is instrumented with a task per partition, as well as a single task on the relevant aggregator.
  • A single garbage collection pass is instrumented with a single task.
  • A single replication pass against a database partition is instrumented with a single task.
  • Tasks may or may not be successful. For example, a query task which commits a transaction is successful, but a query task which encounters rollback is not.

Activity

An activity is a set of identical tasks.

All tasks associated with the same node ID, database, partition, and name comprise the same activity. The tasks in an activity can be considered to share or be instances of an activity.

In practice, tasks share an activity when they instrument multiple repetitions of the same recurring work.

Every activity has a type of either Query, Database, or System. Tasks in a database activity are those which can be associated with an individual database, but not with any individual query. Tasks in a system activity can be associated with neither.

Every activity has a name and an aggregator activity name. We say that an activity is high-level if its aggregator activity name is equal to its name.

If an activity is not high-level, then its aggregator activity name is the name of the high-level activity to which it contributes.

In practice, we report aggregator activity name in order to link per-partition query activities with the unique aggregator activity associated with their query. The actual name of these activities may not be the same on all nodes.

Activity Examples

  • All tasks instrumenting the same query against the same partition share an activity, which has Query type. Tasks instrumenting the query on different partitions have different activities.
  • All garbage collection pass tasks share an activity, which has System type.
  • All replication tasks against the same database partition share an activity, which has Database type.
  • All activities associated with a query have the same aggregator activity name, which is the name of the unique high-level activity associated with the query on the aggregator node.

mv_activities

The mv_activities view in the information_schema database is a high-level summary profile of all tasks which ran recently on any node in the cluster.

It holds a row per high-level activity H. Each row describes the set of tasks which have run over a recent interval of time and which are either instances of the high level activity H, or else instances of an activity L whose aggregator activity name is the name of H. It reports the sum of the profiling statistics of each task in the set, as collected over the recent interval.

The effect of grouping the instances of multiple activities together is mainly to group all tasks associated with a query, across all partitions, into a single row.

mv_activities determines recent resource use by computing the change in mv_activities_cumulative over an interval of time. This interval is controlled by the value of the activities_delta_sleep_s session variable.

mv_activities may only be queried while connected to an aggregator node.

We recommend using this view to begin a performance investigation, as it is the most concise.

However, we recommend against the use of mv_activities to compute the average latency or resource usage of a query. This is because its statistics include the latency and resource usage of currently running tasks, which will skew attempted average calculations. We recommend mv_finished_tasks for this purpose instead.

Column name Description
activity_type The type of the high-level activity.
activity_name The name of the high-level activity. This column is often human-readable, but does not include the full query text for query tasks. Join with mv_queries for the query text.
database_name The name of the database associated with the activity, or NULL if none could be assigned.
run_count The number of instances which were running at the end of the interval.
success_count The number of instances which completed successfully during the interval.
failure_count The number of instances which completed unsuccessfully during the interval.
SIMPLIFIED STATISTICS This view contains all simplified statistics columns.

mv_activities_extended

The mv_activities_extended view in the information_schema database is a detailed profile of all tasks which ran recently on any node in the cluster.

It holds a row per activity. Each row describes the instances of its activity which ran during a recent interval of time. It reports the sum of the profiling statistics of each task in the set, as collected over the recent interval.

mv_activities determines recent resource use by computing the change in mv_activities_extended_cumulative over an interval of time. This interval is controlled by the value of the activities_delta_sleep_s session variable.

mv_activities_extended may only be queried while connected to an aggregator node.

We recommend using this view to get fine-grained understanding of the system behavior described in mv_activities. The main distinctions between mv_activities and mv_activities_extended are:

  • For most queries, mv_activities_extended will report the resource usage of the query with at least a row per partition. mv_activities aggregates these into a single row per query.
  • mv_activities reports simplified statistics, while mv_activities_extended reports extended statistics.

However, we recommend against the use of mv_activities_extended to compute the average latency or resource usage of a query. This is because its statistics include the latency and resource usage of currently running tasks, which will skew attempted average calculations. We recommend mv_finished_tasks for this purpose instead.

Column name Description
node_id An ID equal to id in the row of mv_nodes describing the node on which the instances ran.
activity_type The type of the activity.
activity_name The name of the activity. This column is often human-readable, but does not include the full query text for query tasks. Join with mv_queries for the query text.
aggregator_activity_name The aggregator activity name of the activity.
database_name The name of the database associated with the activity, or NULL if none could be assigned.
partition_id The unique ID of the database partition associated with the activity, or NULL if none could be assigned.
run_count The number of instances which were running at the end of the interval.
success_count The number of instances which completed successfully during the interval.
failure_count The number of instances which completed unsuccessfully during the interval.
EXTENDED STATISTICS This view contains all extended statistics columns.

mv_activities_cumulative

The mv_activities_cumulative view in the information_schema database is a high-level summary profile of completed and currently running tasks on all nodes of the cluster.

It holds a row per high-level activity H. Each row describes tasks which have completed or are currently running, and which are either instances of the high level activity H, or else instances of an activity L whose aggregator activity name is the name of H. It reports the sum of the profiling statistics of each task in the set, as collected over its lifetime.

The effect of grouping the instances of multiple activities together is mainly to group all tasks associated with a query, across all partitions, into a single row.

mv_activities_cumulative may only be queried while connected to an aggregator node.

We recommend against extensive use of this table. It doesn’t accurately describe past system behavior because it doesn’t include all completed instances. This is because historical task statistics may be discarded by MemSQL at unspecified times.

Column name Description
activity_type The type of the activities.
activity_name An ID shared by all instances of the high level activity. This ID is often human-readable, but does not include the query text for query tasks. See mv_queries for the raw query text.
database_name The name of the database associated with the activity, or NULL if none could be assigned.
last_finished_timestamp The timestamp of the most recent completion of an instance of this activity.
run_count The number of running instances of the activity.
success_count The number of successfully completed instances of the activity.
failure_count The number of unsuccessfully completed instances of the activity.
SIMPLIFIED STATISTICS This view contains all simplified statistics columns.

mv_activities_extended_cumulative

The mv_activities_extended_cumulative view in the information_schema database is a detailed profile of completed and currently running tasks on all nodes of the cluster.

It holds a row per activity. Each row describes completed and currently running instances of its activity. It reports the sum of the profiling statistics of each instance, as collected over its lifetime.

mv_activities_extended_cumulative may only be queried while connected to an aggregator node.

We recommend against extensive use of this table. It doesn’t accurately describe past system behavior because it doesn’t include all completed instances. This is because historical task statistics may be discarded by MemSQL at unspecified times.

Column name Description
node_id An ID equal to id in the row of mv_nodes describing the node on which the activity’s instances have run.
activity_type The type of the activity.
activity_name An ID shared by all instances of the activity. This ID is often human-readable, but does not include the query text for query tasks. See mv_queries.
aggregator_activity_name The aggregator_activity_name of the activity.
database_name The name of the database associated with the activity, or NULL if none could be assigned.
partition_id The unique ID of the database partition associated with the activity, or NULL if none could be assigned.
last_finished_timestamp The timestamp of the most recent completion of an instance of this activity.
run_count The number of running instances of this activity.
success_count The number of successfully completed instances of this activity.
failure_count The number of unsuccessfully completed instances of this activity.
EXTENDED_STATISTICS This view contains all extended statistics columns.

mv_aggregated_column_usage

The mv_aggregated_column_usage view in the information_schema database provides a summary of mv_query_column_usage, aggregating over the columns over all queries run.

Column name Description
database_name The name of the database, or NULL if none could be assigned.
table_name The name of the table to which the column belongs.
column_name The aggregated column information is displayed for this column.
equijoins The number of equijoins this column has been involved in over all queries run over the table.
inquality_joins The number of inquality joins this column has been involved in over all queries run over the table.
equality_preds The number of equality predicates in the WHERE or ON clause this column has been involved in all queries run on the table.
inequality_preds The number of inequality predicates in the WHERE or ON clause this column has been involved in all queries run on the table.
groupbys The number of GROUP BY clause in the column involved in all queries run on the table.
reshuffles The number of reshuffle key in the column, involved in all queries run on the table.
orderbys The number of ORDER BY clause in the column involved in all queries run on the table.
outputs The number of times the column was returned as the result of a query.
query_appearances The number of distinct queries the column appears in all queries run on the table, which is equivalent to the number of times the column appears in the first table.

mv_finished_tasks

The mv_finished_tasks view in the information_schema database is a detailed profile of completed tasks on all nodes of the cluster.

It holds a row per activity. Each row describes completed instances of its activity. It reports the sum of the profiling statistics of each instance, as collected over its lifetime.

mv_finished_tasks may only be queried while connected to an aggregator node.

We recommend against extensive use of this table. It doesn’t accurately describe past system behavior because it doesn’t include all completed instances. It will also hide the performance impact of long-running tasks which have yet to complete.

However, we do recommend using this table to estimate the average latency or resource usage of a query. This is because all other statistics tables include currently running tasks in their profiles, skewing attempted average calculations.

Column name Description
node_id An ID equal to id in the row of mv_nodes describing the node on which the activity’s instances ran.
activity_type The type of the activity.
activity_name An ID shared by all instances of the activity. This ID is often human-readable, but does not include the query text for query tasks. Join with mv_queries for the query text.
aggregator_activity_name The aggregator_activity_name of the activity.
database_name The name of the database associated with the activity, or NULL if none could be assigned.
partition_id The unique ID of the database partition associated with the activity, or NULL if none could be assigned.
last_finished_timestamp The timestamp of the most recent completion of an instance of the activity.
success_count The number of successfully completed instances of this activity.
failure_count The number of unsuccessfully completed instances of this activity.
EXTENDED STATISTICS This view contains all extended statistics columns.

mv_tasks

The mv_tasks view in the information_schema database is a detailed profile of all currently running tasks on all nodes of the cluster.

It holds a row per running task. Each row reports the profiling statistics of the task, as collected between the time it began and the current time.

mv_tasks may only be queried while connected to an aggregator node.

We recommend against extensive use of this table. It will hide the performance impact of short-lived but frequent tasks, and exaggerate the impact of long-lived tasks.

Column name Description
node_id An ID equal to id in the row of mv_nodes describing the node on which the task is running.
activity_type The type of the task’s activity.
activity_name An ID shared by all instances of the task’s activity. This ID is often human-readable, but does not include the query text for query tasks.
aggregator_activity_name The aggregator_activity_name of the task’s activity.
database_name The name of the database associated with the task’s activity, or NULL if none could be assigned.
partition_id The unique ID of the database partition associated with the task’s activity, or NULL if none could be assigned.
raw_task_name The full name associated with the task, or NULL if none could be assigned. In particular, this will hold the raw query text for query tasks. It differs from query_text in mv_nodes because it is fully parameterized. Note that this column is only available in mv_tasks.
EXTENDED STATISTICS This view contains all extended statistics columns.

mv_nodes

The mv_nodes view in the information_schema database describes all MemSQL nodes in the cluster. It holds one row per node.

mv_nodes may only be queried while connected to an aggregator node.

Column name Description
id The ID of the node. This ID matches the node_id reported for tasks running on the node.
ip_addr The IP address of the node.
port The port on which the node is listening.
type MA, Leaf, or CA.
state Online, Offline, Detached, or Attaching.
availability_group The availability group of the node.
num_cpus The number of logical CPU cores on which the node may execute.
memory_used_mb The value of total_server_memory on the node.
max_memory_mb Maximum allowed memory for node

mv_queries

The mv_queries view in the information_schema database describes queries seen by MemSQL. It holds a row for each query sent to the cluster by a user, as well as rows for artificial queries sent between nodes according to internal MemSQL logic.

Its purpose is to map the activity_name of query activities to their associated de-parameterized query text.

mv_queries may only be queried while connected to an aggregator node.

Column name Description
activity_name The ID associated with the query. This will be the activity_name of all query tasks associated with the query.
query_text The de-parameterized text of the query.

mv_query_column_usage

The mv_query_column_usage view in the information_schema database provides information about the usage of columns by queries in the workload. Details about column that are used in joins, filters, group-bys, order-bys or reshuffles help to understand the frequency and cost of use of columns, and make better decisions about indexing and sharding of data.

Column name Description
activity_name The ID associated with the query. This will be the activity_name of all query tasks associated with the query.
database_name The name of the database, or NULL if none could be assigned.
table_name The name of the table in the query.
column_name The name of the column that exists in the table.
equijoins The number of times this column was involved in an equijoin in the query.
non_equijoins The number of times the column was involved in a non-equijoin in the query.
equality_preds The number of times the column was involved in equality predicates in the WHERE or ON clause in the query.
inequality_preds The number of times the column was involved in inequality predicates in the WHERE or ON clause in the query.
groupbys The number of times the column was involved in a GROUP BY clause in the query.
orderbys The number of times the column was involved in a ORDER BY clause in the query.
reshuffles The number of times the column was member of reshuffle key in the query.
outputs The number of times the column was returned as the result of the query.