Outdated Version

You are viewing an older version of this section. View current production version.

/Cluster Management/ /

Next Steps min read

View the Dashboards

When all cluster monitoring components are installed, configured, and running, the Grafana dashboards can be used to monitor cluster health over time.

Identify Trends

Each dashboard provides insights that can be used to identify trends that may require intervention, including:

Active Session History

Chart Name	What it shows	When to use it
Active Session History	The activities running on the cluster and their respective resource usage (including CPU, memory, and network)	To view queries that are currently running, and that have been run, on the cluster To view current and past session wait events to identify databases and activities that consume considerable resources, including: - Why queries are running slowly - Where a cluster is resource-constrained

Activity History

Chart Name	What it shows	When to use it
Execution History	The number of times a given query shape was executed over time	To view query statistics over time To determine the number of times a given query has been run To identify if the performance and resource usage of a single activity has regressed over time
Average Time	A view of the average time spent waiting for resources for a given query over time (milliseconds)	To identify and compare the history of the time spent on query resource usage to understand if it’s performing similar to, or different than, previous executions
Average Resource Use	The average resource use for a given query including disk bytes, memory bytes, memory bytes/second, and network bytes/second	To identify and compare the history of query resource usage to understand if it’s using resources similar to, or different than, previous executions

Detailed Cluster View

Chart Name	What it shows	When to use it
Database CPU Breakdown	The CPU cycles spent by each activity, grouped by database	To identify which databases incur the most CPU usage Note: A blank database indicates system activity, which is not related to a user database.
Query Rate	The number of reads/writes per second of the queries running on the system	To understand typical (“normal”) cluster activity to: - Benchmark workloads and their query rates - Identify anomalies in the read/write workload
Rows Read or Written	The number of rows read/written	To understand typical (“normal”) cluster activity to: - Benchmark workload read/write row counts - Identify anomalies in the number of rows read/written
SysInfo CPU	The percentage of the host’s CPU that is being used	To understand CPU usage and host resource usage in general, or for a given workload To identify if any non-SingleStore DB activity is affecting a host’s CPU
SysInfo Memory	The percent of the host’s memory that is being used	To understand host memory usage for a given workload over time To identify if any non-SingleStore DB activity is affecting a host’s memory
SysInfo Network	The network bytes sent and received	To understand network usage for a given workload and identify bottlenecks To identify if any non-SingleStore DB activity is affecting a host’s network
`memsqld` Memory	A summary of host memory usage	To view cross-sections of memory usage within the cluster and identify anomalous memory use
Workload Management Queries	The queries running and their states if affected by workload management	To understand the current cluster load and identify high-workload issues that are affecting the cluster’s ability to process queries
Cluster Events	The count and output of the warning and errors on the cluster (as per mv_events)	To recognize cluster health issues by reviewing the number of events To drill into cluster events to identify and understand what the issues are

Memory Usage

Chart Name	What it shows	When to use it
Used vs. Total Limit	The memory in use compared to the total memory available (megabytes)	To perform capacity planning for memory To identify if the cluster is not performing optimally due to a shortage of memory
Query Memory vs. Total Limit	The query memory in use compared to the total memory available (megabytes)	To perform capacity planning for workloads To identify if workloads in general, or workload spikes in particular, are putting the cluster at risk of running out of memory
Data Memory Used vs. Total Limit	The data memory in use versus the total memory available (megabytes)	To perform capacity planning for data memory To identify if given write workloads are putting the cluster at risk of running out of memory
Internal Memory Allocators vs. Limit	The memory used by SingleStore DB memory allocators (megabytes)	To identify why memory allocations have increased, or are anomalously large, when there are no other indicators of increased memory use, such as workload or data To discover where memory is allocated (table, query, etc.)
Detailed Breakout of Memory Allocators vs. Limit	The memory used by extended SingleStore DB memory allocators (megabytes)	To identify if any memory allocations have increased, or are anomalously large, when there are no other indicators of increased memory use To discover where memory is allocated (table, query, etc.)

SingleStore DB Status & Change

Chart Name	What it shows	When to use it
Show Status and Change (two charts)	The values and changes in SHOW STATUS (sizing units based on the variable)	To identify if any anomalous changes have occurred to SingleStore DB status variables

SingleStore DB Variables & Change

Chart Name	What it shows	When to use it
Show Variables and Change (two charts)	The values and changes to SingleStore DB variables (sizing units depending on the variable)	To view changes to SingleStore DB engine variables over time to identify if any anomalous changes have occurred

Information Schema View

Chart Name	What it shows	When to use it
Table Statistics	The row counts for tables across schemas	To identify anomalies in table sizes in general, and workloads in particular

Node Metrics Breakout

Chart Name	What it shows	When to use it
CPU Utilization	The System Info CPU utilization (percent)	To view a host’s CPU utilization and hardware health to identify if processes outside of SingleStore DB could be affecting them
Filesystem	The filesystem usage (bytes)	To view host-level filesystem usage and identify if processes outside of SingleStore DB could be affecting it
Network Rate	The System Info network rate (byes)	To view host-level network usage and identify if processes outside of SingleStore DB could be affecting it
Memory Bytes	The System Info memory usage (bytes)	To identify host-level and `cgroup` memory usage to identify if processes outside SingleStore DB could be affecting them

Node Metrics Drilldown

Chart Name	What it shows	When to use it
Node Metrics Drilldown	The `memsql_exporter` execution details	To determine if the exporter process is running efficiently and/or to identify lags in data collection

Additional information on Grafana

Troubleshoot Your Monitoring Setup

Pipelines

Check the Monitoring Tables for Data

Connect to the database.
Run the following SQL. The default database name is metrics. If your database name is different from the default name, replace metrics with your database name.
```
use metrics;
select * from metrics limit 10;
```
Optional:
```
select * from all monitoring tables
```
If these queries return an empty set, review the pipelines error tables using the next step.
Review the monitoring pipelines.
```
show pipelines status
```
If a monitoring pipeline (with a name resembling *_metrics and *_blobs) is in a state other than running, start the pipeline.
```
START PIPELINE <pipeline-name>
```
Check the information_schema.pipelines_errors table for errors.
```
select * from information_schema.pipelines_errors
```

Resolve Pipeline Errors

If you receive an Cannot extract data for the pipeline error in the pipelines_error table, perform the following steps.

Confirm that port 9104 is accessible from all hosts in the cluster. This is the default port used for monitoring. To test this, run the following command at the Linux command line and review the output.
```
curl http://<endpoint>:9104/cluster-metrics
```
For example:
```
curl http://192.168.1.100:9104/cluster-metrics
```

If the hostname of the Master Aggregator is localhost, and a pipeline was created using localhost, recreate the pipeline using the Master Aggregator host’s IP addresses. For example:

metrics pipeline:

create or replace pipeline `metrics` as load data prometheus_exporter 
"http://<host-ip-address>:9104/cluster-metrics" 
config '{"is_memsql_internal":true}' 
into procedure `load_metrics` format json;

start pipeline if not running metrics;

blobs pipeline:

create or replace pipeline `blobs` as load data prometheus_exporter 
"http://<host-ip-address>:9104/samples" 
config '{"is_memsql_internal":true, "download_type":"samples"}' 
into procedure `load_blobs` format json;

start pipeline if not running blobs;

Was this Article Helpful?

Need help or have suggestions? Go to SingleStore Forums