View the Dashboards
When all cluster monitoring components are installed, configured, and running, the Grafana dashboards can be used to monitor cluster health over time.
Identify Trends
Each dashboard provides insights that can be used to identify trends that may require intervention, including:
Active Session History
Chart Name | What it shows | When to use it |
---|---|---|
Active Session History | The activities running on the cluster and their respective resource usage (including CPU, memory, and network) | To view queries that are currently running, and that have been run, on the cluster To view current and past session wait events to identify databases and activities that consume considerable resources, including: - Why queries are running slowly - Where a cluster is resource-constrained |
Activity History
Chart Name | What it shows | When to use it |
---|---|---|
Execution History | The number of times a given query shape was executed over time | To view query statistics over time To determine the number of times a given query has been run To identify if the performance and resource usage of a single activity has regressed over time |
Average Time | A view of the average time spent waiting for resources for a given query over time (milliseconds) | To identify and compare the history of the time spent on query resource usage to understand if it’s performing similar to, or different than, previous executions |
Average Resource Use | The average resource use for a given query including disk bytes, memory bytes, memory bytes/second, and network bytes/second | To identify and compare the history of query resource usage to understand if it’s using resources similar to, or different than, previous executions |
Detailed Cluster View
Chart Name | What it shows | When to use it |
---|---|---|
Database CPU Breakdown | The CPU cycles spent by each activity, grouped by database | To identify which databases incur the most CPU usage Note: A blank database indicates system activity, which is not related to a user database. |
Query Rate | The number of reads/writes per second of the queries running on the system | To understand typical (“normal”) cluster activity to: - Benchmark workloads and their query rates - Identify anomalies in the read/write workload |
Rows Read or Written | The number of rows read/written | To understand typical (“normal”) cluster activity to: - Benchmark workload read/write row counts - Identify anomalies in the number of rows read/written |
SysInfo CPU | The percentage of the host’s CPU that is being used | To understand CPU usage and host resource usage in general, or for a given workload To identify if any non-SingleStore DB activity is affecting a host’s CPU |
SysInfo Memory | The percent of the host’s memory that is being used | To understand host memory usage for a given workload over time To identify if any non-SingleStore DB activity is affecting a host’s memory |
SysInfo Network | The network bytes sent and received | To understand network usage for a given workload and identify bottlenecks To identify if any non-SingleStore DB activity is affecting a host’s network |
memsqld Memory |
A summary of host memory usage | To view cross-sections of memory usage within the cluster and identify anomalous memory use |
Workload Management Queries | The queries running and their states if affected by workload management | To understand the current cluster load and identify high-workload issues that are affecting the cluster’s ability to process queries |
Cluster Events | The count and output of the warning and errors on the cluster (as per mv_events) | To recognize cluster health issues by reviewing the number of events To drill into cluster events to identify and understand what the issues are |
Memory Usage
Chart Name | What it shows | When to use it |
---|---|---|
Used vs. Total Limit | The memory in use compared to the total memory available (megabytes) | To perform capacity planning for memory To identify if the cluster is not performing optimally due to a shortage of memory |
Query Memory vs. Total Limit | The query memory in use compared to the total memory available (megabytes) | To perform capacity planning for workloads To identify if workloads in general, or workload spikes in particular, are putting the cluster at risk of running out of memory |
Data Memory Used vs. Total Limit | The data memory in use versus the total memory available (megabytes) | To perform capacity planning for data memory To identify if given write workloads are putting the cluster at risk of running out of memory |
Internal Memory Allocators vs. Limit | The memory used by SingleStore DB memory allocators (megabytes) | To identify why memory allocations have increased, or are anomalously large, when there are no other indicators of increased memory use, such as workload or data To discover where memory is allocated (table, query, etc.) |
Detailed Breakout of Memory Allocators vs. Limit | The memory used by extended SingleStore DB memory allocators (megabytes) | To identify if any memory allocations have increased, or are anomalously large, when there are no other indicators of increased memory use To discover where memory is allocated (table, query, etc.) |
SingleStore DB Status & Change
Chart Name | What it shows | When to use it |
---|---|---|
Show Status and Change (two charts) |
The values and changes in SHOW STATUS (sizing units based on the variable) | To identify if any anomalous changes have occurred to SingleStore DB status variables |
SingleStore DB Variables & Change
Chart Name | What it shows | When to use it |
---|---|---|
Show Variables and Change (two charts) |
The values and changes to SingleStore DB variables (sizing units depending on the variable) | To view changes to SingleStore DB engine variables over time to identify if any anomalous changes have occurred |
Information Schema View
Chart Name | What it shows | When to use it |
---|---|---|
Table Statistics | The row counts for tables across schemas | To identify anomalies in table sizes in general, and workloads in particular |
Node Metrics Breakout
Chart Name | What it shows | When to use it |
---|---|---|
CPU Utilization | The System Info CPU utilization (percent) | To view a host’s CPU utilization and hardware health to identify if processes outside of SingleStore DB could be affecting them |
Filesystem | The filesystem usage (bytes) | To view host-level filesystem usage and identify if processes outside of SingleStore DB could be affecting it |
Network Rate | The System Info network rate (byes) | To view host-level network usage and identify if processes outside of SingleStore DB could be affecting it |
Memory Bytes | The System Info memory usage (bytes) | To identify host-level and cgroup memory usage to identify if processes outside SingleStore DB could be affecting them |
Node Metrics Drilldown
Chart Name | What it shows | When to use it |
---|---|---|
Node Metrics Drilldown | The memsql_exporter execution details |
To determine if the exporter process is running efficiently and/or to identify lags in data collection |
Related Resources
Troubleshoot Your Monitoring Setup
Pipelines
Check the Monitoring Tables for Data
-
Connect to the database.
-
Run the following SQL. The default database name is
metrics
. If your database name is different from the default name, replacemetrics
with your database name.use metrics; select * from metrics limit 10;
Optional:
select * from all monitoring tables
If these queries return an empty set, review the pipelines error tables using the next step.
-
Review the monitoring pipelines.
show pipelines status
-
If a monitoring pipeline (with a name resembling
*_metrics
and*_blobs
) is in a state other thanrunning
, start the pipeline.START PIPELINE <pipeline-name>
-
Check the
information_schema.pipelines_errors
table for errors.select * from information_schema.pipelines_errors
Resolve Pipeline Errors
If you receive an Cannot extract data for the pipeline error
in the pipelines_error table
, perform the following steps.
-
Confirm that port
9104
is accessible from all hosts in the cluster. This is the default port used for monitoring. To test this, run the following command at the Linux command line and review the output.curl http://<endpoint>:9104/cluster-metrics
For example:
curl http://192.168.1.100:9104/cluster-metrics
-
If the hostname of the Master Aggregator is
localhost
, and a pipeline was created usinglocalhost
, recreate the pipeline using the Master Aggregator host’s IP addresses. For example:metrics
pipeline:create or replace pipeline `metrics` as load data prometheus_exporter "http://<host-ip-address>:9104/cluster-metrics" config '{"is_memsql_internal":true}' into procedure `load_metrics` format json;
start pipeline if not running metrics;
blobs
pipeline:create or replace pipeline `blobs` as load data prometheus_exporter "http://<host-ip-address>:9104/samples" config '{"is_memsql_internal":true, "download_type":"samples"}' into procedure `load_blobs` format json;
start pipeline if not running blobs;