Is a relational database a good fit for vector computing?

The basic table schema looks something like this (I'm using MySQL BTW):

integer unsigned vector-id
integer unsigned fk-attribute-id
float attribute-value
primary key (vector-id,fk-attribute-id)


vector is represented as multiple entries in a table with the same vector-id

I need to build a separate table with a dot product (also euclidean distance) of all vectors that exist in that table. So, I need a table of results that looks like this:

integer unsigned fk-vector-id-a
integer unsigned fk-vector-id-b
float dot-product




... and one such ...

integer unsigned fk-vector-id-a
integer unsigned fk-vector-id-b
float euclidean-distance


What is the best query structure for getting my result?

With very large vectors, is a relational database the best approach to solve this problem, or should I internalize the vectors in the application and do the calculations there?

+2


source to share


1 answer


INSERT
INTO    dot_products
SELECT  v1.vector_id, v2.vector_id, SUM(v1.attribute_value * v2.attribute_value)
FROM    attributes v1
JOIN    attributes v2
ON      v2.attribute_id = v1.attribute_id
GROUP BY
        v1.vector_id, v2.vector_id

      

As MySQL

it can be faster:



INSERT
INTO    dot_products
SELECT  v1.vector_id, v2.vector_id,
        (
        SELECT  SUM(va1.attribute_value * va2.attribute_value)
        FROM    attributes va1
        JOIN    attributes va2
        ON      va2.attribute_id = va1.attribute_id
        WHERE   va1.vector_id = v1.vector_id
                AND va2.vector_id = v2.vector_id
        )
FROM    vector v1
CROSS JOIN
        vector v2

      

+4


source







All Articles