Is a relational database a good fit for vector computing?
The basic table schema looks something like this (I'm using MySQL BTW):
integer unsigned vector-id
integer unsigned fk-attribute-id
float attribute-value
primary key (vector-id,fk-attribute-id)
vector is represented as multiple entries in a table with the same vector-id
I need to build a separate table with a dot product (also euclidean distance) of all vectors that exist in that table. So, I need a table of results that looks like this:
integer unsigned fk-vector-id-a
integer unsigned fk-vector-id-b
float dot-product
... and one such ...
integer unsigned fk-vector-id-a
integer unsigned fk-vector-id-b
float euclidean-distance
What is the best query structure for getting my result?
With very large vectors, is a relational database the best approach to solve this problem, or should I internalize the vectors in the application and do the calculations there?
source to share
INSERT
INTO dot_products
SELECT v1.vector_id, v2.vector_id, SUM(v1.attribute_value * v2.attribute_value)
FROM attributes v1
JOIN attributes v2
ON v2.attribute_id = v1.attribute_id
GROUP BY
v1.vector_id, v2.vector_id
As MySQL
it can be faster:
INSERT
INTO dot_products
SELECT v1.vector_id, v2.vector_id,
(
SELECT SUM(va1.attribute_value * va2.attribute_value)
FROM attributes va1
JOIN attributes va2
ON va2.attribute_id = va1.attribute_id
WHERE va1.vector_id = v1.vector_id
AND va2.vector_id = v2.vector_id
)
FROM vector v1
CROSS JOIN
vector v2
source to share