You are viewing an older version of this section. View current production version.
Vector Functions
MemSQL includes the following built-in functions for working with vectors of 32-bit floating-point numbers, represented as blobs:
Inserting Vectors as Blobs from an External Application
As a best practice when inserting binary data into a table from an external application, first convert a vector blob to its hexadecimal representation. Then in MemSQL, execute the INSERT
statement using the UNHEX()
built-in function to convert the hex representation back to its original binary format.
Consider the following example written in Python, which uses a few modules to generate vectors and convert them to hex:
import numpy
import struct
vector = numpy.random.random(1024)
vectorStr = "".join([struct.pack('f', elem) for elem in vector])
vectorStrEncoded = vectorStr.encode('hex')
When this code is executed, a random floating point vector is generated:
[0.09361789 0.22198475 0.36929942 ..., 0.97525847 0.98422799 0.26693578]
The vector is then converted to a binary string representation. Finally, the resulting vector is hex encoded:
bdbabf3df84f633ed014bd3ef6ca183ffc759b ... 48ca303e8aaa793f5ef67b3fcfab883e
Using the hex-encoded vector, you can connect to the database and execute a query that calls the UNHEX()
built-in function to convert the vector to its original binary format during insertion:
memsqlConnection.query('INSERT INTO TABLE vectors VALUES (UNHEX("%s"))' % (vectorStrEncoded))
Performance Note
DOT_PRODUCT()
, EUCLIDEAN_DISTANCE()
, and VECTOR_SUB()
are high-performance functions implemented for fast vector operations, using single-instruction multiple-data (SIMD) processor instructions. Your hardware must support AVX-2 instructions in order to use these functions.
Use Cases
A common use case for these functions is to work with floating-point vectors that describe images, such as images of faces or products. These vectors can be created using a neural network trained to categorize the images. Objects are inserted into a table. One field of the object is the vector that describes the image, obtained by feeding the image into the neural network. Queries using DOT_PRODUCT()
or EUCLIDEAN_DISTANCE()
can then be run to find closest matches. For example, the following query retrieves the 10 closest matches to the constant vector in the query, based on a cosine similarity match implemented using DOT_PRODUCT()
.
select t.*, dot_product(t.vector, unhex("...constant vector...")) as score
from t
order by score desc
limit 10;