Updating a table with the closest value from a lookup matrix
TABLE 1
The first table is a matrix containing reference values, something like this:
create table dm_matrix
(x float,
z float,
avgValue float)
insert into dm_matrix values (1,1, rand())
insert into dm_matrix values (1,2, rand())
...
insert into dm_matrix values (4,3, rand())
insert into dm_matrix values (4,4, rand())
Creating a matrix like this
z\x | 1 | 2 | 3 | 4 |
-----------------------------
1 | .1 | .7 | .3 | .2 |
2 | .5 | .1 | .8 | .6 |
3 | .6 | .2 | .3 | .9 |
4 | .4 | .3 | .3 | .5 |
TABLE 2
The second table is a list of events with their coordinates:
create table dm_values
(vx float,
vz float,
v float)
insert into dm_values (vx, vz) values (1 + rand()*3, 1 + rand()*3)
...
insert into dm_values (vx, vz) values (1 + rand()*3, 1 + rand()*3)
Thus, records have coordinates, but no values ββfor events
vx | vz | v |
-----------------------
1.3 | 2.7 | null |
2.6 | 2.7 | null |
1.3 | 3.3 | null |
1.9 | 1.1 | null |
3.0 | 2.9 | null |
... | ... | ... |
A TASK
I want to update the second table with the closest value from the first. So for the first record (x = 1.3, z = 2.7), I would like the value to update to the value from the lookup matrix where x = 1 and z = 3 (.6). The updated table will look like this:
vx | vz | v |
-----------------------
1.3 | 2.7 | 0.6 |
2.6 | 2.7 | 0.3 |
1.3 | 3.3 | 0.6 |
1.9 | 1.1 | 0.7 |
3.0 | 2.9 | 0.3 |
... | ... | ... |
PROBLEM
I can get this - in theory - working with cursors. But my TABLE 1 has about 2000 lines (~ 50x40), and my TABLE 2 lines have millions. The cursor approach will take days, weeks, or months. I need a better solution and am grateful for any advice.
source to share
From what I can see, you should be able to accomplish a simple UPDATE
using JOIN
where ROUND
values vx
and vz
for conditions JOIN
, performance, you will have to test this on your dataset.
Here is the basic method for the JOIN
data, mind you, I filled in the scripts INSERT
to have a complete matrix:
CREATE TABLE #dm_matrix
(
x FLOAT ,
z FLOAT ,
avgValue DECIMAL(2, 1)
)
INSERT INTO #dm_matrix
VALUES ( 1, 1, RAND() )
INSERT INTO #dm_matrix
VALUES ( 1, 2, RAND() )
INSERT INTO #dm_matrix
VALUES ( 1, 3, RAND() )
INSERT INTO #dm_matrix
VALUES ( 1, 4, RAND() )
INSERT INTO #dm_matrix
VALUES ( 2, 1, RAND() )
INSERT INTO #dm_matrix
VALUES ( 2, 2, RAND() )
INSERT INTO #dm_matrix
VALUES ( 2, 3, RAND() )
INSERT INTO #dm_matrix
VALUES ( 2, 4, RAND() )
INSERT INTO #dm_matrix
VALUES ( 3, 1, RAND() )
INSERT INTO #dm_matrix
VALUES ( 3, 2, RAND() )
INSERT INTO #dm_matrix
VALUES ( 3, 3, RAND() )
INSERT INTO #dm_matrix
VALUES ( 3, 4, RAND() )
INSERT INTO #dm_matrix
VALUES ( 4, 1, RAND() )
INSERT INTO #dm_matrix
VALUES ( 4, 2, RAND() )
INSERT INTO #dm_matrix
VALUES ( 4, 3, RAND() )
INSERT INTO #dm_matrix
VALUES ( 4, 4, RAND() )
SELECT *
FROM #dm_matrix
CREATE TABLE #dm_values
(
vx DECIMAL(2, 1) ,
vz DECIMAL(2, 1) ,
v FLOAT
)
INSERT INTO #dm_values
( vx, vz )
VALUES ( 1 + RAND() * 3, 1 + RAND() * 3 )
INSERT INTO #dm_values
( vx, vz )
VALUES ( 1 + RAND() * 3, 1 + RAND() * 3 )
SELECT *
FROM #dm_values
-- replace this SELECT with the UPDATE commands below to update values
SELECT v.vx ,
v.vz ,
m.avgValue
FROM #dm_values v
INNER JOIN #dm_matrix m ON ROUND(v.vx, 0) = m.x
AND ROUND(v.vz, 0) = m.z
DROP TABLE #dm_matrix
DROP TABLE #dm_values
And for UPDATE
you would do something like this:
UPDATE v
SET v.v = m.avgValue
FROM #dm_values v
INNER JOIN #dm_matrix m ON ROUND(v.vx, 0) = m.x AND ROUND(v.vz, 0) = m.z
SELECT * FROM #dm_values
Outputs:
Matrix:
x z avgValue
1 1 0.6
1 2 0.9 -- row 2 below
1 3 0.4
1 4 0.5
2 1 0.7
2 2 0.4
2 3 0.5 -- row 1 below
2 4 0.5
3 1 0.4
3 2 0.1
3 3 0.3
3 4 0.8
4 1 0.1
4 2 1.0
4 3 0.5
4 4 0.5
Values:
vx vz v
1.8 2.8 NULL -- x = 2, z = 3
1.3 1.5 NULL -- x = 1, z = 2
After update:
vx vz v
1.8 2.8 0.5
1.3 1.5 0.9
Note:
I changed the datatype to DECIMAL(2, 1)
for the purposes of this post, so you may need to change this based on the actual dataset.
source to share
I think you should use cross apply
or a correlated subquery for this.
update t2
set v = m.avgvalue
from t2 cross apply
(select top 1 m.*
from dm_matrix
order by square(m.x - t2.vx) + square(m.z - t2.vz)
);
I'm not sure which function you are using for distance, but Euclidean distance seems like a reasonable interpretation.
source to share