Is a relational database a good fit for vector computing?

Question

Is a relational database a good fit for vector computing?

The basic table schema looks something like this (I'm using MySQL BTW):

integer unsigned vector-id integer unsigned fk-attribute-id float attribute-value primary key (vector-id,fk-attribute-id)

vector is represented as multiple entries in a table with the same vector-id

I need to build a separate table with a dot product (also euclidean distance) of all vectors that exist in that table. So, I need a table of results that looks like this:

integer unsigned fk-vector-id-a integer unsigned fk-vector-id-b float dot-product

... and one such ...

integer unsigned fk-vector-id-a integer unsigned fk-vector-id-b float euclidean-distance

What is the best query structure for getting my result?

With very large vectors, is a relational database the best approach to solve this problem, or should I internalize the vectors in the application and do the calculations there?

+2

optimization math sql vector

JR Lawhorne 26 Aug '09 at 16:19

source to share

1 answer

Quassnoi · Accepted Answer · 2009-08-26T16:32:36+0000

INSERT
INTO    dot_products
SELECT  v1.vector_id, v2.vector_id, SUM(v1.attribute_value * v2.attribute_value)
FROM    attributes v1
JOIN    attributes v2
ON      v2.attribute_id = v1.attribute_id
GROUP BY
        v1.vector_id, v2.vector_id

As MySQL

it can be faster:

INSERT
INTO    dot_products
SELECT  v1.vector_id, v2.vector_id,
        (
        SELECT  SUM(va1.attribute_value * va2.attribute_value)
        FROM    attributes va1
        JOIN    attributes va2
        ON      va2.attribute_id = va1.attribute_id
        WHERE   va1.vector_id = v1.vector_id
                AND va2.vector_id = v2.vector_id
        )
FROM    vector v1
CROSS JOIN
        vector v2

Is a relational database a good fit for vector computing?

More articles: