You are viewing an older version of this section. View current production version.
APPROX_COUNT_DISTINCT
An aggregate function. Returns an estimate of the number of distinct values in a collection.
APPROX_COUNT_DISTINCT
is useful if an exact result is not required. Compared to SELECT COUNT(DISTINCT <column>)
, which calculates the exact number of distinct values in a column of table, APPROX_COUNT_DISTINCT
can run much faster and consume significantly less memory.
Syntax
APPROX_COUNT_DISTINCT ( expr )
Arguments
expr
An expression evaluating to a scalar value.
expr
is typically the name of a column. In this case, APPROX_COUNT_DISTINCT
returns an estimate of the number of distinct values in the column.
Examples
The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>)
and APPROX_COUNT_DISTINCT
on the column so you can compare the results.
DROP DATABASE IF EXISTS memsql_docs_example;
CREATE DATABASE memsql_docs_example;
USE memsql_docs_example;
DELIMITER //
CREATE PROCEDURE populate_table() AS
BEGIN
CREATE TABLE t(a INT);
FOR i IN 1..100000 LOOP
INSERT INTO t VALUES(i);
END LOOP;
END
//
DELIMITER ;
CALL populate_table();
Get the exact count of distinct values in the column a
:
SELECT COUNT(DISTINCT a) FROM t;
****
+--------------------+
| COUNT(DISTINCT a) |
+--------------------+
| 100000 |
+--------------------+
Get an estimate of the number of distinct values in the column a
:
SELECT APPROX_COUNT_DISTINCT(a) FROM t;
****
+--------------------------+
| APPROX_COUNT_DISTINCT(a) |
+--------------------------+
| 100498 |
+--------------------------+