Outdated Version

You are viewing an older version of this section. View current production version.

PROFILE

Provides detailed resources usage metrics about a query.

PROFILE select_statement
SHOW PROFILE

When you run a PROFILE statement, it executes the associated SELECT statement and collects resource usage metrics about the SELECT statement.

After the query has executed, run the SHOW PROFILE statement to display the collected metrics. Like the EXPLAIN statement, SHOW PROFILE displays query execution operations as a tree, where the operator at the top is the last executed before returning the result to the client, and operators below are executed before their parents; however, SHOW PROFILE additionally displays resource usage metrics for the operators in the execution tree.

The following metrics are gathered about each when executing a statement using PROFILE:

  • Network traffic
  • Memory usage
  • Execution time
  • The actual number of rows affected by each operator in the execution tree

Not all of the above metrics will be gathered for queries where execution performance may be hindered by gathering such metrics.

You are also able to run SHOW PROFILE even if a query or profile exits with an error. In this case, the output of SHOW PROFILE will include profiling information for the operators executed before the point of failure. This may be helpful while troubleshooting the reason for the failure.

Profile Example

The PROFILE statement is particularly helpful when evaluating distributed query performance, such as a distributed join. Distributed joins that require broadcasts or repartitions are expensive, and the PROFILE statement can help you understand how such queries are executed so that they can be optimized.

In the following example, a distributed join is executed against a four node cluster to return data about customers and their orders. Before executing a PROFILE statement, it’s best to first execute an EXPLAIN statement to understand the execution path for a query. EXPLAIN may also inform you that you should first run an ANALYZE statement to ensure the database has the most up-to-date statistics on its data.

memsql> EXPLAIN SELECT COUNT(*) FROM orders o INNER JOIN customer c ON o.custkey = c.custkey;
+--------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------+
| Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`]                                                                               |
| Aggregate [SUM(remote_0.`count(*)`) AS $0]                                                                                           |
| Gather partitions:all est_rows:1 alias:remote_0                                                                                      |
| Project [`count(*)`] est_rows:1 est_select_cost:3308565                                                                              |
| Aggregate [COUNT(*) AS `count(*)`]                                                                                                   |
| NestedLoopJoin                                                                                                                       |
| |---IndexSeek memsql_demo.customer AS c, PRIMARY KEY (custkey) scan:[custkey = r0.custkey] est_table_rows:150000 est_filtered:150000 |
| TableScan r0 storage:list stream:no                                                                                                  |
| Repartition [o.custkey] AS r0 shard_key:[custkey] est_rows:1102855                                                                   |
| TableScan memsql_demo.orders AS o, PRIMARY KEY (orderkey) est_table_rows:1102855 est_filtered:1102855                                |
+--------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

From this EXPLAIN statement, it’s clear that a repartition is required, and that a nested loop join will be performed. Now you can run the PROFILE statement with the same SELECT query to gather resource usage metrics.

memsql> PROFILE SELECT COUNT(*) FROM orders o INNER JOIN customer c ON o.custkey = c.custkey;
+----------+
| count(*) |
+----------+
| 1102855  |
+----------+
1 row in set (0.17 sec)

The PROFILE statement will output the same results as the inner SELECT statement, but it has also gathered resource usage metrics. To see the metrics, you must run the SHOW PROFILE statement.

memsql> SHOW PROFILE;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| PROFILE                                                                                                                                                   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`] actual_rows: 1 network_traffic: 0.008000 KB                                                        |
| Aggregate [SUM(remote_0.`count(*)`) AS $0] actual_rows: 32                                                                                                |
| Gather partitions:all est_rows:1 alias:remote_0 actual_rows: 32 exec_time: 71ms                                                                           |
| Project [`count(*)`] est_rows:1 est_select_cost:3308565 actual_rows: 32 network_traffic: 0.192000 KB                                                      |
| Aggregate [COUNT(*) AS `count(*)`] actual_rows: 1102855                                                                                                   |
| NestedLoopJoin actual_rows: 1102855                                                                                                                       |
| |---IndexSeek memsql_demo.customer AS c, PRIMARY KEY (custkey) scan:[custkey = r0.custkey] est_table_rows:150000 est_filtered:150000 actual_rows: 1102855 |
| TableScan r0 storage:list stream:no actual_rows: 1102855 exec_time: 70ms                                                                                  |
| Repartition [o.custkey] AS r0 shard_key:[custkey] est_rows:1102855 actual_rows: 1102855 network_traffic: 6904.042969 KB                                   |
| TableScan memsql_demo.orders AS o, PRIMARY KEY (orderkey) est_table_rows:1102855 est_filtered:1102855 actual_rows: 1102855 exec_time: 57ms                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

After the SHOW PROFILE statement has been executed, you can see from the example above that actual rows, network traffic, and execution time have been appended to the end of the execution tree lines. The Repartition operation was a particularly expensive operation, requiring nearly 7MB of network traffic.