Outdated Version

You are viewing an older version of this section. View current production version.

Vector Functions

MemSQL includes the following built-in functions for working with vectors of 32-bit floating-point numbers, represented as blobs:

Inserting Vectors as Blobs from an External Application

As a best practice when inserting binary data into a table from an external application, first convert a vector blob to its hexadecimal representation. Then in MemSQL, execute the INSERT statement using the UNHEX() built-in function to convert the hex representation back to its original binary format.

Consider the following example written in Python, which uses a few modules to generate vectors and convert them to hex:

import numpy
import struct

vector = numpy.random.random(1024)
vectorStr = "".join([struct.pack('f', elem) for elem in vector])
vectorStrEncoded = vectorStr.encode('hex')

When this code is executed, a random floating point vector is generated:

[0.09361789  0.22198475  0.36929942 ...,  0.97525847  0.98422799  0.26693578]

The vector is then converted to a binary string representation. Finally, the resulting vector is hex encoded:

bdbabf3df84f633ed014bd3ef6ca183ffc759b ... 48ca303e8aaa793f5ef67b3fcfab883e

Using the hex-encoded vector, you can connect to the database and execute a query that calls the UNHEX() built-in function to convert the vector to its original binary format during insertion:

memsqlConnection.query('INSERT INTO TABLE vectors VALUES (UNHEX("%s"))' % (vectorStrEncoded))

Performance Note

DOT_PRODUCT(), EUCLIDEAN_DISTANCE(), and VECTOR_SUB() are high-performance functions implemented for fast vector operations, using single-instruction multiple-data (SIMD) processor instructions. Your hardware must support AVX-2 instructions in order to use these functions.

Use Cases

A common use case for these functions is to work with floating-point vectors that describe images, such as images of faces or products. These vectors can be created using a neural network trained to categorize the images. Objects are inserted into a table. One field of the object is the vector that describes the image, obtained by feeding the image into the neural network. Queries using DOT_PRODUCT() or EUCLIDEAN_DISTANCE() can then be run to find closest matches. For example, the following query retrieves the 10 closest matches to the constant vector in the query, based on a cosine similarity match implemented using DOT_PRODUCT().

select t.*, dot_product(t.vector, unhex("...constant vector...")) as score
from t
order by score desc
limit 10;