You are viewing an older version of this section. View current production version.
PROFILE
Provides detailed resources usage metrics about a query.
PROFILE select_statement
SHOW PROFILE
When you run a PROFILE
statement, it executes the associated SELECT
statement and collects resource usage metrics about the SELECT
statement.
After the query has executed, run the SHOW PROFILE
statement to display the collected metrics. Like the EXPLAIN statement, SHOW PROFILE
displays query execution operations as a tree, where the operator at the top is the last executed before returning the result to the client, and operators below are executed before their parents; however, SHOW PROFILE
additionally displays resource usage metrics for the operators in the execution tree.
The following metrics are gathered about each when executing a statement using PROFILE
:
- Network traffic
- Memory usage
- Execution time
- The actual number of rows affected by each operator in the execution tree
Not all of the above metrics will be gathered for queries where execution performance may be hindered by gathering such metrics.
You are also able to run SHOW PROFILE
even if a query or profile exits with an error. In this case, the output of SHOW PROFILE
will include profiling information for the operators executed before the point of failure. This may be helpful while troubleshooting the reason for the failure.
Profile Example
The PROFILE
statement is particularly helpful when evaluating distributed query performance, such as a distributed join. Distributed joins that require broadcasts or repartitions are expensive, and the PROFILE
statement can help you understand how such queries are executed so that they can be optimized.
In the following example, a distributed join is executed against a four node cluster to return data about customers and their orders. Before executing a PROFILE
statement, it’s best to first execute an EXPLAIN
statement to understand the execution path for a query. EXPLAIN
may also inform you that you should first run an ANALYZE
statement to ensure the database has the most up-to-date statistics on its data.
memsql> EXPLAIN SELECT COUNT(*) FROM orders o INNER JOIN customer c ON o.custkey = c.custkey;
+--------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------+
| Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`] |
| Aggregate [SUM(remote_0.`count(*)`) AS $0] |
| Gather partitions:all est_rows:1 alias:remote_0 |
| Project [`count(*)`] est_rows:1 est_select_cost:3308565 |
| Aggregate [COUNT(*) AS `count(*)`] |
| NestedLoopJoin |
| |---IndexSeek memsql_demo.customer AS c, PRIMARY KEY (custkey) scan:[custkey = r0.custkey] est_table_rows:150000 est_filtered:150000 |
| TableScan r0 storage:list stream:no |
| Repartition [o.custkey] AS r0 shard_key:[custkey] est_rows:1102855 |
| TableScan memsql_demo.orders AS o, PRIMARY KEY (orderkey) est_table_rows:1102855 est_filtered:1102855 |
+--------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)
From this EXPLAIN
statement, it’s clear that a repartition is required, and that a nested loop join will be performed. Now you can run the PROFILE
statement with the same SELECT
query to gather resource usage metrics.
memsql> PROFILE SELECT COUNT(*) FROM orders o INNER JOIN customer c ON o.custkey = c.custkey;
+----------+
| count(*) |
+----------+
| 1102855 |
+----------+
1 row in set (0.17 sec)
The PROFILE
statement will output the same results as the inner SELECT
statement, but it has also gathered resource usage metrics. To see the metrics, you must run the SHOW PROFILE
statement.
memsql> SHOW PROFILE;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| PROFILE |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`] actual_rows: 1 network_traffic: 0.008000 KB |
| Aggregate [SUM(remote_0.`count(*)`) AS $0] actual_rows: 32 |
| Gather partitions:all est_rows:1 alias:remote_0 actual_rows: 32 exec_time: 71ms |
| Project [`count(*)`] est_rows:1 est_select_cost:3308565 actual_rows: 32 network_traffic: 0.192000 KB |
| Aggregate [COUNT(*) AS `count(*)`] actual_rows: 1102855 |
| NestedLoopJoin actual_rows: 1102855 |
| |---IndexSeek memsql_demo.customer AS c, PRIMARY KEY (custkey) scan:[custkey = r0.custkey] est_table_rows:150000 est_filtered:150000 actual_rows: 1102855 |
| TableScan r0 storage:list stream:no actual_rows: 1102855 exec_time: 70ms |
| Repartition [o.custkey] AS r0 shard_key:[custkey] est_rows:1102855 actual_rows: 1102855 network_traffic: 6904.042969 KB |
| TableScan memsql_demo.orders AS o, PRIMARY KEY (orderkey) est_table_rows:1102855 est_filtered:1102855 actual_rows: 1102855 exec_time: 57ms |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)
After the SHOW PROFILE
statement has been executed, you can see from the example above that actual rows, network traffic, and execution time have been appended to the end of the execution tree lines. The Repartition
operation was a particularly expensive operation, requiring nearly 7MB of network traffic.