Simplify timeline in SQL (Netezza)
I want to generalize / simplify (I don't know what to call) my timeline.
So I have an ID with a timeline. I am trying to get rid of the overlap timeline within the same id.
Here's some sample data. What do I have:
ID START_TIME END_TIME
1 a b
1 c d
1 e f
1 g h
As you can see from the picture, they [a,b],[c,d],[e,f]
overlap, [g,h]
not overlap, so I just want [a,f]
and [g,h]
. What I want:
ID START_TIME END_TIME
1 a f
1 g h
source to share
I think @ shA.T is close. The problem lies in the multiple overlaps that can break. You may need to turn this into a multi-step process
Step 1 (Make a sample table):
create temp table stack (
id integer
,start_time timestamp
,end_time timestamp
)
insert into stack values(1, date('2020-01-01'),date('2020-01-01') + interval '3 hours');
insert into stack values(1,date('2020-01-01') + interval '2 hours',date('2020-01-01') + interval '4 hours');
insert into stack values(1,date('2020-01-01') + interval '3.5 hours',date('2020-01-01') + interval '5 hours');
insert into stack values(1,date('2020-01-01') + interval '5.5 hours',date('2020-01-01') + interval '6.5 hours');
insert into stack values(1,date('2020-01-01') + interval '7.5 hours',date('2020-01-01') + interval '9.5 hours');
insert into stack values(1,date('2020-01-01') + interval '8.5 hours',date('2020-01-01') + interval '10.5 hours');
Step 2 (find single slabs):
create temp table stack2 as
SELECT ID, ps2 as start_time, max(e) AS End_Time
FROM (
SELECT t1.ID, t1.START_TIME AS s, MAX(t1.END_TIME) AS e,
max(t2.START_TIME) As ps, MAX(t2.END_TIME) AS pe
,CASE WHEN pe between s and e THEN ps ELSE s END ps2
FROM stack AS t1
JOIN stack AS t2 ON t1.START_TIME > t2.START_TIME
GROUP BY t1.ID, t1.START_TIME) AS DT
GROUP BY
ID, ps2
ORDER BY ps2
Step 3 (Combining double slabs):
SELECT ID, ps2 as start_time, max(e) AS End_Time
FROM (
SELECT t1.ID, t1.START_TIME AS s, MAX(t1.END_TIME) AS e,
max(t2.START_TIME) As ps, MAX(t2.END_TIME) AS pe
,CASE WHEN pe between s and e THEN ps ELSE s END ps2
FROM stack2 AS t1
JOIN stack2 AS t2 ON t1.START_TIME > t2.START_TIME
GROUP BY t1.ID, t1.START_TIME) AS DT
GROUP BY
ID, ps2
ORDER BY ps2
source to share
I found a solution without adding an extra column like this:
SELECT ID, MIN(CASE WHEN pe between s and e THEN ps ELSE s END) AS START_TIME, MAX(e) AS End_Time
FROM (
SELECT t1.ID, t1.START_TIME AS s, t1.END_TIME AS e,
MAX(t2.START_TIME) As ps, MAX(t2.END_TIME) AS pe
FROM t AS t1
JOIN t AS t2 ON t1.START_TIME > t2.START_TIME
GROUP BY t1.ID, t1.START_TIME, t1.END_TIME ) AS DT
GROUP BY
ID, CASE WHEN pe between s and e THEN 1 ELSE 0 END
ORDER BY s
source to share