Select nearest neighbors
consider the following data:
category | index | value
-------------------------
cat 1 | 1 | 2
cat 1 | 2 | 3
cat 1 | 3 |
cat 1 | 4 | 1
cat 2 | 1 | 5
cat 2 | 2 |
cat 2 | 3 |
cat 2 | 4 | 6
cat 3 | 1 |
cat 3 | 2 |
cat 3 | 3 | 2
cat 3 | 4 | 1
I'm trying to fill the holes so that hole = avg(value)
of the two nearest neighbors with non-zero values ββin the category:
category | index | value
-------------------------
cat 1 | 1 | 2
cat 1 | 2 | 3
cat 1 | 3 | 2*
cat 1 | 4 | 1
cat 2 | 1 | 5
cat 2 | 2 | 5.5*
cat 2 | 3 | 5.5*
cat 2 | 4 | 6
cat 3 | 1 | 1.5*
cat 3 | 2 | 1.5*
cat 3 | 3 | 2
cat 3 | 4 | 1
I'm playing around with window functions and I'm sure this can be achieved, but the solution eludes me.
Any ideas?
You are right, the window function is what you are looking for. Here's how it can be done (the part is with
used to define the table, so you probably won't need it):
with dt as
(
select * from
(
values
('cat 1', 1, 2),
('cat 1', 2, 3),
('cat 1', 3, null),
('cat 1', 4, 1),
('cat 2', 1, 5),
('cat 2', 2, null),
('cat 2', 3, null),
('cat 2', 4, 6),
('cat 3', 1, null),
('cat 3', 2, null),
('cat 3', 3, 1),
('cat 3', 4, 2)
) tbl ("category", "index", "value")
)
select
"category",
"index",
case
when "value" is null then (avg("value") over (partition by "category") )
else "value"
end
from dt
order by "category", "index";
refer to a section of WINDOW Clause
this page for more information on window functions.
I worked on a solution for you, but the SQLfiddle is giving (internal) errors at the moment, so I cannot complete it.
A statement like this should do the update for you:
update table1 as t1
set value =
(select avg(value)
from
(select value
from table1 as t3
where t1.category = t3.category
and t3.index in (t1.index - 1, t1.index + 1)
) AS T2
)
where value is null
;
The script I was working on is here: http://sqlfiddle.com/#!15/acbc2/1
While I'm sure it is possible to do some terribly complex and nested assertion that does what you want, I meant to say that sometimes it's better to write a script in a regular programming language like python / ruby ββ/ java that does iterate over the DB table and make any changes you want.
It will be much more maintainable and you will want to rebuild the whole thing every time you need to make any changes to it (for example, use 3 nearest neighbors instead or change the definition of "nearest neighbor")