You are viewing an older version of this section. View current production version.
MemSQL includes the following built-in functions for working with vectors of 32-bit floating-point numbers, represented as blobs:
Inserting Vectors as Blobs from an External Application
As a best practice when inserting binary data into a table from an external application, first convert a vector blob to its hexadecimal representation. Then in MemSQL, execute the
INSERT statement using the
UNHEX() built-in function to convert the hex representation back to its original binary format.
Consider the following example written in Python, which uses a few modules to generate vectors and convert them to hex:
import numpy import struct vector = numpy.random.random(1024) vectorStr = "".join([struct.pack('f', elem) for elem in vector]) vectorStrEncoded = vectorStr.encode('hex')
When this code is executed, a random floating point vector is generated:
[0.09361789 0.22198475 0.36929942 ..., 0.97525847 0.98422799 0.26693578]
The vector is then converted to a binary string representation. Finally, the resulting vector is hex encoded:
bdbabf3df84f633ed014bd3ef6ca183ffc759b ... 48ca303e8aaa793f5ef67b3fcfab883e
Using the hex-encoded vector, you can connect to the database and execute a query that calls the
UNHEX() built-in function to convert the vector to its original binary format during insertion:
memsqlConnection.query('INSERT INTO TABLE vectors VALUES (UNHEX("%s"))' % (vectorStrEncoded))
VECTOR_SUB() are high-performance functions implemented for fast vector operations, using single-instruction multiple-data (SIMD) processor instructions. Your hardware must support AVX-2 instructions in order to use these functions.
A common use case for these functions is to work with floating-point vectors that describe images, such as images of faces or products. These vectors can be created using a neural network trained to categorize the images. Objects are inserted into a table. One field of the object is the vector that describes the image, obtained by feeding the image into the neural network. Queries using
EUCLIDEAN_DISTANCE() can then be run to find closest matches. For example, the following query retrieves the 10 closest matches to the constant vector in the query, based on a cosine similarity match implemented using
select t.*, dot_product(t.vector, unhex("...constant vector...")) as score from t order by score desc limit 10;