Outdated Version

You are viewing an older version of this section. View current production version.

DOT_PRODUCT

Returns the scalar product, or dot product, of two vector input values.

Syntax

DOT_PRODUCT(vector_expression, vector_expression)

vector_expression

An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision floating-point numbers in little-endian byte order. A vector can be of any length, but both input vectors must be the same length and the blob lengths must be divisible by 4 bytes.

Remarks

DOT_PRODUCT() can be used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.

To execute this function, the host processor must support AVX2 instruction set extensions. If AVX2 is not supported, an error will occur during execution.

If the result of DOT_PRODUCT() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

Examples

Example: SELECT Using DOT_PRODUCT() on Existing Rows

The following example executes DOT_PRODUCT() on a row containing two vectors. The HEX() built-in function is also used to return a readable form of the binary output.

Create a table with two BLOB-typed columns:

memsql> CREATE TABLE t (a BLOB, b BLOB);
Query OK, 0 rows affected (0.26 sec)

Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:

memsql> INSERT INTO t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
Query OK, 1 row affected (0.22 sec)

To demonstrate the contents of the table, use the HEX() built-in function to return a readable form of the binary data:

memsql> SELECT HEX(t.a), HEX(t.b) FROM t;
+--------------------------+--------------------------+
| HEX(t.a)                 | HEX(t.b)                 |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+
1 row in set (0.15 sec)

Query the table using the DOT_PRODUCT() function in a SELECT statement:

memsql> SELECT DOT_PRODUCT(t.a, t.b) FROM t;
+-----------------------+
| DOT_PRODUCT(t.a, t.b) |
+-----------------------+
|     4.200000084936619 |
+-----------------------+
1 row in set (0.14 sec)

Example: DOT_PRODUCT() with JSON_ARRAY_PACK()

The JSON_ARRAY_PACK() built-in function makes it easier to add properly-formatted vectors as input parameters to the DOT_PRODUCT() function. Consider the example below:

SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'))
+-------------------------------------------------------------------------------------+
| DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) |
+-------------------------------------------------------------------------------------+
|                                                                                5.25 |
+-------------------------------------------------------------------------------------+
1 row in set (0.10 sec)

Example: Floating Point Deviation in DOT_PRODUCT() Function

The DOT_PRODUCT function accepts FLOAT data type as an input and returns a DOUBLE. The output does not have exact DOUBLE precision because of the FLOAT input.

Note: It is advisable not to directly compare floating point values for equality.

For example, two vectors are provided as input parameters to the DOT_PRODUCT() function.

INSERT INTO t1 VALUES(JSON_ARRAY_PACK('[0.6,0.8]'), JSON_ARRAY_PACK('[0.6,0.8]'));
SELECT DOT_PRODUCT(c1, c2) FROM t1;
****
+---------------------+
| DOT_PRODUCT(c1, c2) |
+---------------------+
|  1.0000000596046448 |
+---------------------+

In this example, the output is not 1.0 as would be expected. The output 1.0 would be possible only if the input vector values are perfectly representable in floating point.

Related Topics