Outdated Version

You are viewing an older version of this section. View current production version.

check

Info

MemSQL Helios does not support this command.

Checks the provided report for issues.

Usage

Checks the provided report for issues.

Available Checkers:

+-----------------------------------+----------+----------------------------------------------------------------------------------+
|                ID                 | EXCLUDED |                                   DESCRIPTION                                    |
+-----------------------------------+----------+----------------------------------------------------------------------------------+
| attachRebalanceDelay              |          | This variable should be set to 120 (default). If it is set to another value, the |
|                                   |          | cluster may experience delays in self-healing operations                         |
| autoAttach                        |          | This variable should be set to "ON" (default). "OFF" value is preventing the     |
|                                   |          | nodes from reattaching after restart                                             |
| blockedQueries                    |          | Blocked queries may lead to additional failed operations. We recommend that you  |
|                                   |          | reduce your workload or kill running queries                                     |
| cgroupDisabled                    |          | Linux memory subsystems use a number of bytes of memory per physical page on     |
|                                   |          | x86_64 systems. These resources are consumed even when memory is not used in     |
|                                   |          | any hierarchy. As SingleStoreDB doesn't use the memory subsystem, we recommend   |
|                                   |          | disabling this as it will reduce the resource consumption of the kernel          |
| cgroupMemoryUsage                 |          | Node processes may be run within a cgroup with memory limits; exceeding those    |
|                                   |          | limits may lead to decreased performance and/or node failure                     |
| chronydDisabled                   |          | We recommend that chronyd is disabled so that ntpd can be used for time          |
|                                   |          | synchronization. Contact your administrator to disable chronyd                   |
| clusterMemoryUsage                |          | As SingleStoreDB is allocated the value specified in maximum_memory, query       |
|                                   |          | failures may result if memory usage approaches this limit. To alleviate this     |
|                                   |          | condition (for the short term), increase maximum_memory or to delete data which  |
|                                   |          | is being stored in memory to allow more headroom                                 |
| collectionErrors                  |          | Collection errors in the report typically indicate that all parts of the report  |
|                                   |          | could not be gathered. This could mean that some information may be missing      |
|                                   |          | and a thorough check could not be performed, or that Toolbox cannot access the   |
|                                   |          | required information                                                             |
| columnstoreSegmentRows            |          | Inconsistent columnstore segment rows can lead to non-optimal query performance  |
|                                   |          | or other issues. Columnstore segment rows refers to the number of rows           |
|                                   |          | SingleStoreDB holds in each segment. The default value is 1024000. Refer to      |
|                                   |          | https://docs.singlestore.com/docs/managing-columnstore-segments/ for more        |
|                                   |          | information                                                                      |
| consistentMaxMemory               |          | Inconsistent maximum memory settings will lead to some nodes having more or less |
|                                   |          | memory available for operations and can cause performance inconsistencies across |
|                                   |          | the cluster. We recommend that all max_memory settings are consistent. Refer to  |
|                                   |          | https://docs.singlestore.com/docs/configure-memory-limits/ for more information  |
| cpuFeatures                       |          | SingleStoreDB can make use of AVX2 instructions for optimal performance. Refer   |
|                                   |          | to https://docs.singlestore.com/docs/instruction-set-verification/ for more      |
|                                   |          | information                                                                      |
| cpuFreqPolicy                     | EXCLUDED | Disabling power saving and Turbo Mode settings on all hosts will lead to more    |
|                                   |          | consistent performance across the cluster                                        |
| cpuHyperThreading                 |          | A CPU with hyperthreading will ensure optimal performance. Hyperthreading allows |
|                                   |          | a CPU to split a physical core into two virtual cores, or "threads." This allows |
|                                   |          | each core to do two things simultaneously                                        |
| cpuIdle                           |          | In general, SingleStore recommends utilizing all of the cores available on       |
|                                   |          | a host. However, if a CPU is frequently less than 5% idle, this typically        |
|                                   |          | indicates that your workload will not have room to grow, and more cores are      |
|                                   |          | likely required                                                                  |
| cpuMemoryBandwidth                |          | Low CPU-memory bandwidth can highlight potential performance issues on your      |
|                                   |          | hosts                                                                            |
| cpuModel                          |          | Differing CPU models may lead to inconsistent performance                        |
| defaultVariables                  |          | We recommend keeping the default values for these variables for optimal cluster  |
|                                   |          | operation                                                                        |
| defaultWorkloadManagement         |          | We recommend keeping the default values for the workload management settings for |
|                                   |          | optimal cluster operation                                                        |
| defunctProcesses                  |          | Defunct processes may be using system resources and preventing their use by      |
|                                   |          | SingleStoreDB. It is recommended that you kill these processes if possible       |
| delayedThreadLaunches             |          | Delayed thread launches may indicate that a workload is too intensive for the    |
|                                   |          | available threads. We recommend decreasing the cluster's workload                |
| detectCrashStackTraces            |          | The presence of dmp.stack files indicates that a SingleStoreDB node has crashed, |
|                                   |          | which should be investigated                                                     |
| disconnectedReplicationSlaves     |          | Disconnected replication slaves may mean that you don't have full redundancy in  |
|                                   |          | your system                                                                      |
| diskBandwidth                     |          | Disk bandwidth, an indicator of disk performance, is computed by examining the   |
|                                   |          | total bytes transferred between the first request for service and the completion |
|                                   |          | of the transfer                                                                  |
| diskInodesUsage                   |          | Exhausting the inode capacity can lead to the inability to store and/or retrieve |
|                                   |          | data. To alleviate this potential issue, either increase the inode capacity, or  |
|                                   |          | reduce the inode usage                                                           |
| diskLatencyRead                   |          | Disk bandwidth is an important performance indicator when reading data.          |
|                                   |          | SingleStore recommends investigating potential disk performance issues when the  |
|                                   |          | disk's "read" latency is greater than 10 ms                                      |
| diskLatencyWrite                  |          | Disk bandwidth is an important performance indicator when writing data.          |
|                                   |          | SingleStore recommends investigating potential disk performance issues when the  |
|                                   |          | disk's "write" latency is greater than 10 ms                                     |
| diskUsage                         |          | Checks free disk space and identifies if you are approaching your disk capacity  |
|                                   |          | limits                                                                           |
| duplicatePartitionDatabase        |          | Duplicate partitions may cause extra memory or disk usage in your system         |
| explainRebalancePartitionsChecker |          | If the cluster isn't properly rebalanced (where EXPLAIN REBALANCE PARTITIONS is  |
|                                   |          | not null), partitions are not distributed evenly across the cluster. An uneven   |
|                                   |          | partition distribution can lead to nodes containing more data and/or performing  |
|                                   |          | more work (leading to "hotspots"). To remedy, run REBALANCE PARTITIONS. Refer to |
|                                   |          | https://docs.singlestore.com/docs/rebalance-partitions/ for more information     |
| failedBackgroundThreadAllocations |          | Failed background thread allocations can lead to further cascading cluster       |
|                                   |          | issues. It is recommended you scale back your workload when you see these        |
|                                   |          | failures                                                                         |
| failedCodegen                     |          | Code generation errors indicate that your SQL was not properly compiled. We      |
|                                   |          | recommend that you review and correct the query that caused the code generation  |
|                                   |          | error                                                                            |
| failureDetectionOn                |          | SingleStoreDB nodes will not properly fail over if failure detection is set to   |
|                                   |          | OFF. To ensure that SingleStoreDB nodes will properly fail over, set failure     |
|                                   |          | detection to ON                                                                  |
| filesystemType                    |          | Unsupported file systems may cause unpredictable results. Please                 |
|                                   |          | ensure your cluster is deployed on a supported filesystem. Refer to              |
|                                   |          | https://docs.singlestore.com/docs/system-requirements/columnstore-performance/   |
|                                   |          | for more information                                                             |
| highAvailability                  |          | High availability mode distributes leaf nodes among availability groups such     |
|                                   |          | that paired leaves do not share the same host                                    |
| installedPermissions              |          | Specific file ownership permissions are required to run SingleStoreDB. This      |
|                                   |          | check ensures that the permissions are set properly so that SingleStoreDB can    |
|                                   |          | operate without issue                                                            |
| interpreterMode                   |          | We recommend setting the interpreter mode to interpret_first. When               |
|                                   |          | set, SingleStoreDB interprets and compiles a query shape in parallel             |
|                                   |          | as the query is encountered rather than compiling it first. Refer to             |
|                                   |          | https://docs.singlestore.com/docs/code-generation/interpreter-modes/ for more    |
|                                   |          | information                                                                      |
| kernelVersions                    |          | Inconsistent kernel versions are not recommended                                 |
| leafAverageRoundtripLatency       |          | If leafroundtrip latency is high, we recommend checking your network             |
|                                   |          | connectivity between hosts                                                       |
| leavesNotOnline                   |          | Offline leaf nodes may indicate a cluster issue. If high availability is not     |
|                                   |          | enabled, the databases will be inaccessible                                      |
| longRunningQueries                |          | Long-running queries may indicate that the cluster's workload is too high. We    |
|                                   |          | recommend checking the cluster's workload for long-running queries and killing   |
|                                   |          | them                                                                             |
| majorPageFaults                   |          | Memory pressure is an indicator that a hosts's memory is unable to efficiently   |
|                                   |          | service processing needs. Frequent page faults on a host are a sign of memory    |
|                                   |          | pressure                                                                         |
| mallocActiveMemory                |          | Shows the memory allocated directly from the operating system and managed by     |
|                                   |          | the C runtime allocators (not SingleStoreDB’s built-in memory allocators that    |
|                                   |          | use the Buffer Manager). In this case, the memory use should be approximately 1  |
|                                   |          | - 2 GBs for most workloads. If larger, we recommend investigating the system's   |
|                                   |          | memory use                                                                       |
| maxMapCount                       |          | Incorrectly setting this can lead to memory errors. Refer to                     |
|                                   |          | https://docs.singlestore.com/memsql-report-redir/configure-linux-vm-settings for |
|                                   |          | more information                                                                 |
| maxMemorySettings                 |          | We recommend setting the maximum memory to a percentage of the host's total      |
|                                   |          | memory, with a ceiling of 90%                                                    |
| maxOpenFiles                      |          | A setting lower than the recommended setting can significantly                   |
|                                   |          | degrade performance and introduce connection limit errors. Refer to              |
|                                   |          | https://docs.singlestore.com/memsql-report-redir/configure-linux-vm-settings for |
|                                   |          | more information                                                                 |
| memoryCommitted                   |          | Virtual memory can potentially be overallocated, and exceed a hosts's physical   |
|                                   |          | memory. This can lead to a workload failures due to memory pressure              |
| memsqlVersions                    |          | We recommended that the deployed version of SingleStoreDB is consistent across   |
|                                   |          | all hosts and nodes                                                              |
| minFreeKbytes                     |          | Setting these to the recommended values will minimize                            |
|                                   |          | the likelihood of memory errors on your hosts. Refer to                          |
|                                   |          | https://docs.singlestore.com/memsql-report-redir/configure-linux-vm-settings for |
|                                   |          | more information                                                                 |
| missingClusterDb                  |          | The cluster database holds all the metadata for your cluster. A missing cluster  |
|                                   |          | database requires intermediate intervention and potentialaly a refresh of your   |
|                                   |          | cluster via backup/restore                                                       |
| networkBuffersMax                 |          | wmem_max and rmem_max are network settings that control the send and receive     |
|                                   |          | socket buffer sizes, respectively. If these parameters are set too low, you may  |
|                                   |          | experience latency. It is recommended to set each of these values to a minimum   |
|                                   |          | of 8MB                                                                           |
| numaConfiguration                 |          | When running SingleStoreDB on hosts that support Non-Uniform Memory              |
|                                   |          | Access (NUMA) sockets, we recommend configuring SingleStoreDB                    |
|                                   |          | for NUMA via numactl for optimal performance. Refer to                           |
|                                   |          | https://docs.singlestore.com/studio-redir/memsql-deploy-configure-numa/ for more |
|                                   |          | information                                                                      |
| offlineAggregators                |          | Offline aggregators must be addressed as less work will be load-balanced across  |
|                                   |          | the cluster                                                                      |
| orchestratorProcesses             |          | Orchestrator processes may cause undesired actions to be taken on SingleStoreDB  |
|                                   |          | hosts which may negatively impact the cluster                                    |
| orphanDatabases                   |          | Orphan databases, while unused, still consume memory. Orphan databases           |
|                                   |          | can and should be cleared using CLEAR ORPHAN DATABASES. Refer to                 |
|                                   |          | https://docs.singlestore.com/docs/clear-orphan-databases/ for more information   |
| orphanTables                      |          | Orphan tables, while unused, still consume memory. Orphan tables                 |
|                                   |          | can and should be cleared using CLEAR ORPHAN DATABASES. Refer to                 |
|                                   |          | https://docs.singlestore.com/docs/clear-orphan-databases/ for more information   |
| outOfMemory                       |          | Out-of-memory errors may indicate memory pressure on the cluster.                |
|                                   |          | We recommend identifying and reducing memory usage. Refer to                     |
|                                   |          | https://docs.singlestore.com/docs/identifying-reducing-memory-usage/ for more    |
|                                   |          | information                                                                      |
| partitionsConsistency             |          | We recommend that SSD partitions start at a minimum of 4096 byte-sectors. Disk   |
|                                   |          | performance issues may result if this value is inconsistent across hosts, or if  |
|                                   |          | the partition starts at < 4096 byte-sectors                                      |
| pendingDatabases                  |          | Pending databases are available for read and write queries. Databases that       |
|                                   |          | remains in a "pending" state for an extended period shoud be investigated        |
| queuedQueries                     |          | A large number of queued queries may indicate a high cluster workload. We        |
|                                   |          | recommend reducing the workload and/or killing long-running queries              |
| readyQueueSaturated               |          | Ready Queue saturation indicates there aren't enough connection threads          |
|                                   |          | available to handle the workload. We recommend reducing the workload and/or      |
|                                   |          | killing long-running queries                                                     |
| replicationLag                    |          | Checks if the replication on the secondary cluster is out of sync with the       |
|                                   |          | primary cluster                                                                  |
| replicationPausedDatabases        |          | Identifies if PAUSE REPLICATION has been run and provides a status               |
| runningAlterOrTruncate            |          | A running ALTER or TRUNCATE command may explain why the cluster is experiencing  |
|                                   |          | issues when attempting to run queries                                            |
| runningBackup                     |          | This informational check can help troubleshoot issues caused by running a backup |
| secondaryDatabases                |          | This informational check can help determine if the cluster is the primary        |
|                                   |          | cluster, or a secondary/replicated one                                           |
| securityLimits                    |          | Checks that the nproc and NOFILE limits in /etc/security/limits.conf are at      |
|                                   |          | least 128000 and 1024000, respectively                                           |
| swapEnabled                       |          | This check determines if there is adequate swap space on a host, where 10% or    |
|                                   |          | more of physical memory is typically allocated for swap. Swap space will be      |
|                                   |          | utilized when the host is under memory pressure                                  |
| swapUsage                         |          | Your host may be under memory pressure if the swap space that is actively being  |
|                                   |          | used is greater than 5%                                                          |
| syncCnfVariables                  |          | If sync variables are not set in the engine, there will be discrepancies between |
|                                   |          | what the cnf file contains and what the associated values actually are           |
| tracelogHardShutdown              |          | Search for nodes that have sustained a hard shutdown (where a node's host has    |
|                                   |          | crashed or lost power)                                                           |
| tracelogOOD                       |          | Out of disk space                                                                |
| tracelogOOM                       |          | Out of memory                                                                    |
| transparentHugepage               |          | Disable transparent huge pages (THP) for optimal SingleStoreDB performance.      |
|                                   |          | Refer to https://docs.singlestore.com/memsql-report-redir/transparent-hugepage/  |
|                                   |          | for more information                                                             |
| unkillableQueries                 |          | Indicates that there are queries running on your cluster that can't be killed.   |
|                                   |          | This may be due to long-running processes that have rendered other processes     |
|                                   |          | to be unkillable. We recommend identifying long-running processes using SHOW     |
|                                   |          | PROCESSLIST and killing them                                                     |
| unmappedMasterPartitions          |          | Use ATTACH PARTITIONS to reattach disconnected partitions to the cluster. Refer  |
|                                   |          | to https://docs.singlestore.com/docs/attach-partition/ for more information      |
| unrecoverableDatabases            |          | An unrecoverable database is no longer readable or writeable                     |
| userDatabaseRedundancy            |          | The absence of redundancy indicates that not all partitions                      |
|                                   |          | have replicas that they can failover to. We recommend running                    |
|                                   |          | EXPLAIN RESTORE REDUNDANCY and restoring if possible. Refer to                   |
|                                   |          | https://docs.singlestore.com/docs/restore-redundancy/ for more information       |
| validLicense                      |          | A valid and properly applied license is required to comply with SingleStoreDB    |
|                                   |          | terms and conditions                                                             |
| validateSsd                       |          | SingleStoreDB must be deployed and run on SSDs                                   |
| versionHashes                     |          | Confirms that a SingleStoreDB version is a General Availability (GA) release     |
| vmOvercommit                      |          | By design, Linux kills processes that are consuming large amounts of memory when |
|                                   |          | the amount of free memory is deemed to be too low. Overcommit settings that are  |
|                                   |          | set too low may cause frequent and unnecessary failures                          |
| vmSwappiness                      |          | The vm.swappiness value affects system performance as it controls when swapping  |
|                                   |          | is activated, and how swap space is used. 	When set to lower values, the kernel   |
|                                   |          | will use less swap space. When set to higher values, the kernel will use         |
|                                   |          | more swap space. While the range of acceptable values is from 0 to 100, the      |
|                                   |          | recommended value is from 1 to 10, and should never be set to 0.                 |
| whitespacesInObjectName           |          | It is a bad practice to create a database object with leading or trailing        |
|                                   |          | whitespace because it is handled as a separate object than an identical one      |
|                                   |          | without whitespace                                                               |
+-----------------------------------+----------+----------------------------------------------------------------------------------+


Examples:

# Run a single checker
sdb-report check --only orchestratorProcesses

# Run pre-SingleStoreDB install environment checks only. Use this command with sdb-report collect --validate-env
sdb-report check --validate-env

# Exclude specific checkers
sdb-report check --exclude minFreeKbytes --exclude maxOpenFiles

Usage:
  sdb-report check [flags]

Flags:
      --exclude VALUES              Exclude the specified checkers
      --exclude-global              Exclude global collectors from the report file checker
  -h, --help                        Help for check
      --include VALUES              Include the specified checkers
      --include-performance         Include checkers that create load on cluster (not recommended for active clusters)
      --mask-ip                     Mask usernames, hostnames, IP and MAC addresses in the report file checker
      --only VALUES                 Only run the specified checkers
  -i, --report-path ABSOLUTE_PATH   Read the report from the specified tarball or directory. If you do not already have a report, run 'sdb-report collect' to generate one
      --show-skips                  Display more information about skipped checks
      --validate-env                Run checkers that do not require SingleStoreDB installation (performance checkers included)

Global Flags:
      --backup-cache FILE_PATH                File path for the backup cache
      --cache-file FILE_PATH                  File path for the Toolbox node cache
  -c, --config FILE_PATH                      File path for the Toolbox configuration
      --disable-colors                        Disable color output in console, which some terminal sessions/environments may have difficulty with
      --disable-spinner                       Disable the progress spinner, which some terminal sessions/environments may have issues with
  -j, --json                                  Enable JSON output
      --parallelism POSITIVE_INTEGER          Maximum number of operations to run in parallel
      --runtime-dir DIRECTORY_PATH            Where to store Toolbox runtime data
      --ssh-control-persist SECONDS           Enable SSH ControlPersist and set it to the specified duration in seconds
      --ssh-max-sessions POSITIVE_INTEGER     Maximum number of SSH sessions to open per host, must be at least 3
      --ssh-strict-host-key-checking          Enable strict host key checking for SSH connections
      --ssh-user-known-hosts-file FILE_PATH   Path to the user known_hosts file for SSH connections. If not set, /dev/null will be used
      --state-file FILE_PATH                  Toolbox state file path
  -v, --verbosity count                       Increase logging verbosity: valid values are 1, 2, 3. Usage -v=count or --verbosity=count
  -y, --yes                                   Enable non-interactive mode and assume the user would like to move forward with the proposed actions by default

Pre-installation Environment Validation

Before installing SingleStore DB on hosts, you need to validate the deployment environment for sufficient resources and optimal configurations to ensure the best possible performance of your database. The sdb-report tool provides a series of collectors and pre-installation environment checks that can help tune your hardware to be the most compatible with SingleStore DB. The pre-installation checks are only run against the components that apply to hosts without the SingleStore DB installed on them.

If you are deploying SingleStore DB using SingleStore Tools, the sdb-report tool will automatically check the system and flag potential configuration changes prior to the SingleStore DB deployment. The pre-installation checks can also be run manually on new hosts that are added to a cluster as part of database scaling or on existing hosts in a cluster.

Perform the following steps to manually check if your environment is ready for SingleStore DB deployment.

  1. Install SingleStore Tools and register the hosts on which you plan to deploy SingleStore DB.

  2. Run the following command at the command line.

    sdb-report collect --validate-env
    

    This command collects a report on the pre-installation checks from all the registered hosts. You can collect a report from specific hosts by using the --host flag.

  3. After the report has been collected, run the following command at the command line. Make sure to provide the path to the report collected in the previous step.

    sdb-report check --validate-env --report-path <report-name/including/path>
    

    This command returns a list of pre-installation checks as pass/fail/warn metrics, and alerts on any potential configuration changes that you need to make before proceeding with the SingleStore DB deployment.

    Below is a sample output of the pre-installation checks.

    sdb-report check --validate-env --report-path report-2021-05-04T075311.tar.gz
    
    ****
    
    ✓ diskUsage ..................................... [PASS]
    ✓ collectionErrors .............................. [PASS]
    ✓ defunctProcesses .............................. [PASS]
    ✓ diskLatencyRead ............................... [PASS]
    ✓ swapUsage ..................................... [PASS]
    ✓ diskLatencyWrite .............................. [PASS]
    ✓ cpuFeatures ................................... [PASS]
    ✓ cpuIdle ....................................... [PASS]
    ✘ transparentHugepage ........................... [FAIL]
    FAIL /sys/kernel/mm/transparent_hugepage/defrag is [madvise] on 172.0.0.1
    NOTE https://docs.memsql.com/memsql-report-redir/transparent-hugepage
    ✓ validateSsd ................................... [PASS]
    ✓ memoryCommitted ............................... [PASS]
    ✓ networkBuffersMax ............................. [PASS]
    ✓ orchestratorProcesses ......................... [PASS]
    ✓ majorPageFaults ............................... [PASS]
    ✓ cpuFreqPolicy ................................. [PASS]
    NOTE cpu freq info collector on 172.0.0.1 had non-empty stderr output, can be found at cpuFreqInfo/cpuFreqInfo_stderr
    NOTE wasn't able to get powersave state on 172.0.0.1 due to cpufreq driver disabled for your kernel
    ✓ kernelVersions ................................ [PASS]
    NOTE 3.16 on all
    ✘ diskBandwidth ................................. [WARN]
    WARN disk bandwidth collection error for host 172.0.0.1: Cannot collect disk bandwidth info because stress-ng is unavailable
    ✓ cpuModel ...................................... [PASS]
    NOTE Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz on all
    ✓ minFreeKbytes ................................. [PASS]
    ✓ vmOvercommit .................................. [PASS]
    ✓ swapEnabled ................................... [PASS]
    ✓ cpuHyperThreading ............................. [PASS]
    ✘ maxMapCount ................................... [FAIL]
    Fail vm.max_map_count = 65530 too low on 172.0.0.1
    ✘ cpuMemoryBandwidth ............................ [WARN]
    WARN cpu-memory bandwidth collector on 172.0.0.1 encountered error: Impossible to collect cpu-memory bandwidth info due to mlc unavailable
    ✓ cgroupDisabled ................................ [PASS]
    some checks failed: 21 PASS, 2 WARN, 2 FAIL
    

    The diagnostics from the sample report recommend the following actions based on the best practices for using SingleStore DB.

    • Increase the virtual memory (vm) setting, vm.max_map_count, to the specified value, which will decrease the risk of memory errors.

    • Disable Transparent Huge Pages to ensure that the system has consistent query performance times.

    Similarly, you can gather the necessary configuration changes from the report and then tune your environment for a SingleStore DB deployment.

Remarks

This command is interactive unless you use either the --yes or --json flags to override interactive behavior.