Updating a table with the closest value from a lookup matrix

TABLE 1

The first table is a matrix containing reference values, something like this:

create table dm_matrix
(x float,
z float,
avgValue float)


insert into dm_matrix values (1,1, rand())
insert into dm_matrix values (1,2, rand())
...
insert into dm_matrix values (4,3, rand())
insert into dm_matrix values (4,4, rand())

      

Creating a matrix like this

z\x |  1  |  2  |  3  |  4  |
-----------------------------
  1 |  .1 |  .7 |  .3 |  .2 |
  2 |  .5 |  .1 |  .8 |  .6 |
  3 |  .6 |  .2 |  .3 |  .9 |
  4 |  .4 |  .3 |  .3 |  .5 |

      

TABLE 2

The second table is a list of events with their coordinates:

create table dm_values 
(vx float,
vz float,
v float)

insert into dm_values (vx, vz) values (1 + rand()*3, 1 + rand()*3)
...
insert into dm_values (vx, vz) values (1 + rand()*3, 1 + rand()*3)

      

Thus, records have coordinates, but no values ​​for events

   vx  |  vz  |   v   |
-----------------------
  1.3  |  2.7 |  null |
  2.6  |  2.7 |  null |
  1.3  |  3.3 |  null |
  1.9  |  1.1 |  null |
  3.0  |  2.9 |  null |
  ...  |  ... |   ... |

      

A TASK

I want to update the second table with the closest value from the first. So for the first record (x = 1.3, z = 2.7), I would like the value to update to the value from the lookup matrix where x = 1 and z = 3 (.6). The updated table will look like this:

   vx  |  vz  |   v   |
-----------------------
  1.3  |  2.7 |   0.6 |
  2.6  |  2.7 |   0.3 |
  1.3  |  3.3 |   0.6 |
  1.9  |  1.1 |   0.7 |
  3.0  |  2.9 |   0.3 |
  ...  |  ... |   ... |

      

PROBLEM

I can get this - in theory - working with cursors. But my TABLE 1 has about 2000 lines (~ 50x40), and my TABLE 2 lines have millions. The cursor approach will take days, weeks, or months. I need a better solution and am grateful for any advice.

+3


source to share


2 answers


From what I can see, you should be able to accomplish a simple UPDATE

using JOIN

where ROUND

values vx

and vz

for conditions JOIN

, performance, you will have to test this on your dataset.

Here is the basic method for the JOIN

data, mind you, I filled in the scripts INSERT

to have a complete matrix:

CREATE TABLE #dm_matrix
    (
      x FLOAT ,
      z FLOAT ,
      avgValue DECIMAL(2, 1)
    )


INSERT  INTO #dm_matrix
VALUES  ( 1, 1, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 1, 2, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 1, 3, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 1, 4, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 2, 1, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 2, 2, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 2, 3, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 2, 4, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 3, 1, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 3, 2, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 3, 3, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 3, 4, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 4, 1, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 4, 2, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 4, 3, RAND() )
INSERT  INTO #dm_matrix
VALUES  ( 4, 4, RAND() )

SELECT  *
FROM    #dm_matrix

CREATE TABLE #dm_values
    (
      vx DECIMAL(2, 1) ,
      vz DECIMAL(2, 1) ,
      v FLOAT
    )

INSERT  INTO #dm_values
        ( vx, vz )
VALUES  ( 1 + RAND() * 3, 1 + RAND() * 3 )
INSERT  INTO #dm_values
        ( vx, vz )
VALUES  ( 1 + RAND() * 3, 1 + RAND() * 3 )

SELECT  *
FROM    #dm_values

-- replace this SELECT with the UPDATE commands below to update values
SELECT  v.vx ,
        v.vz ,
        m.avgValue
FROM    #dm_values v
        INNER JOIN #dm_matrix m ON ROUND(v.vx, 0) = m.x
                                   AND ROUND(v.vz, 0) = m.z

DROP TABLE #dm_matrix
DROP TABLE #dm_values

      

And for UPDATE

you would do something like this:

UPDATE v
SET v.v = m.avgValue
FROM #dm_values v 
INNER JOIN #dm_matrix m ON ROUND(v.vx, 0) = m.x AND ROUND(v.vz, 0) = m.z

SELECT * FROM #dm_values

      

Outputs:

Matrix:



x   z   avgValue
1   1   0.6
1   2   0.9  -- row 2 below
1   3   0.4
1   4   0.5
2   1   0.7
2   2   0.4
2   3   0.5  -- row 1 below
2   4   0.5
3   1   0.4
3   2   0.1
3   3   0.3
3   4   0.8
4   1   0.1
4   2   1.0
4   3   0.5
4   4   0.5  

      

Values:

vx  vz  v
1.8 2.8 NULL  -- x = 2, z = 3
1.3 1.5 NULL  -- x = 1, z = 2

      

After update:

vx  vz  v
1.8 2.8 0.5
1.3 1.5 0.9

      

Note:

I changed the datatype to DECIMAL(2, 1)

for the purposes of this post, so you may need to change this based on the actual dataset.

+1


source


I think you should use cross apply

or a correlated subquery for this.

update t2
    set v = m.avgvalue
    from t2 cross apply
         (select top 1 m.*
          from dm_matrix
          order by square(m.x - t2.vx) + square(m.z - t2.vz)
         );

      



I'm not sure which function you are using for distance, but Euclidean distance seems like a reasonable interpretation.

+2


source







All Articles