Oracle: select missing dates
I have a table with (among other things) dates in a field.
I need to get a list of all dates that are later than the oldest date, older than the most recent date, and are completely missing from the table.
So, if the table contains:
2012-01-02 2012-01-02 2012-01-03 2012-01-05 2012-01-05 2012-01-07 2012-01-08
I need a query that returns:
2012-01-04 2012-01-06
source to share
Something like this (assuming your table has a name your_table
and your date column has a name the_date
):
with date_range as (
select min(the_date) as oldest,
max(the_date) as recent,
max(the_date) - min(the_date) as total_days
from your_table
),
all_dates as (
select oldest + level - 1 as a_date
from date_range
connect by level <= (select total_days from date_range)
)
select ad.a_date
from all_dates ad
left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;
Edit: The
clause WITH
is called "common table expression" and is equivalent to a view ("inline view").
He looks like
select *
from (
.....
) all_dates
join your_table ...
The second CTE simply creates a list of dates on the fly using an undocumented Oracle implementation function connect by
.
Reusing select (as with calculating the first and last date) is a little easier (and IMHO more readable) than using views.
Edit 2:
This can be done with a recursive CTE as well:
with date_range as (
select min(the_date) as oldest,
max(the_date) as recent,
max(the_date) - min(the_date) as total_days
from your_table
),
all_dates (a_date, lvl) as (
select oldest as a_date, 1 as lvl
from date_range
union all
select (select oldest from date_range) + lvl, lvl + 1
from all_dates
where lvl < (select total_days from date_range)
)
select ad.a_date, lvl
from all_dates ad
left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;
Which should work in all DBMSs that support recursive CTEs (PostgreSQL and Firebird are more standard compatibility, but should be used recursive
).
Notice the hack select (select oldest from date_range) + lvl, lvl + 1
in the recursive part. This is not required, but Oracle still has some bugs about DATEs in recursive CTEs. In PostgreSQL, the following works without issue:
....
all_dates (a_date, lvl) as (
select oldest as a_date, 0 as lvl
from date_range
union all
select a_date + 1, lvl + 1
from all_dates
where lvl < (select total_days from date_range)
)
....
source to share
I would choose this option because it is more efficient:
with all_dates_wo_boundary_values as
( select oldest + level the_date
from ( select min(the_date) oldest
, max(the_date) recent
from your_table
)
connect by level <= recent - oldest - 1
)
select the_date
from all_dates_wo_boundary_values
minus
select the_date
from your_table
And here's some evidence. Install first:
SQL> create table your_table (the_date)
2 as
3 select date '2012-01-02' from dual union all
4 select date '2012-01-02' from dual union all
5 select date '2012-01-03' from dual union all
6 select date '2012-01-05' from dual union all
7 select date '2012-01-05' from dual union all
8 select date '2012-01-07' from dual union all
9 select date '2012-01-08' from dual
10 /
Table created.
SQL> exec dbms_stats.gather_table_stats(user,'your_table')
PL/SQL procedure successfully completed.
SQL> alter session set statistics_level = all
2 /
Session altered.
Horse:
SQL> with date_range as
2 ( select min(the_date) as oldest
3 , max(the_date) as recent
4 , max(the_date) - min(the_date) as total_days
5 from your_table
6 )
7 , all_dates as
8 ( select ( select oldest from date_range) + level as a_date
9 from dual
10 connect by level <= (select total_days from date_range)
11 )
12 select ad.a_date
13 from all_dates ad
14 left join your_table yt on ad.a_date = yt.the_date
15 where yt.the_date is null
16 order by ad.a_date
17 /
A_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00
2 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gaqx49vb9gz9k, child number 0
-------------------------------------
with date_range as ( select min(the_date) as oldest , max(the_date) as recent , max(the_date) - min(the_date) as total_d
ays from your_table )
, all_dates as ( select ( select oldest from date_range) + level as a_date from dual connect by level <= (select total_days from
date_range) ) select
ad.a_date from all_dates ad left join your_table yt on ad.a_date = yt.the_date where yt.the_date is null order by ad.a_date
Plan hash value: 1419150012
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 2 |00:00:00.01 | 22 | 1 | 1 | | | |
| 2 | LOAD AS SELECT | | 1 | | 1 |00:00:00.01 | 7 | 0 | 1 | 262K| 262K| 262K (0)|
| 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | |
| 4 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | 0 | 0 | | | |
| 5 | SORT ORDER BY | | 1 | 1 | 2 |00:00:00.01 | 12 | 1 | 0 | 2048 | 2048 | 2048 (0)|
|* 6 | FILTER | | 1 | | 2 |00:00:00.01 | 12 | 1 | 0 | | | |
|* 7 | HASH JOIN OUTER | | 1 | 1 | 7 |00:00:00.01 | 12 | 1 | 0 | 1048K| 1048K| 707K (0)|
| 8 | VIEW | | 1 | 1 | 6 |00:00:00.01 | 9 | 1 | 0 | | | |
| 9 | CONNECT BY WITHOUT FILTERING| | 1 | | 6 |00:00:00.01 | 3 | 0 | 0 | | | |
| 10 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 11 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660C_81240964 | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | |
| 13 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | 0 | 0 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("YT"."THE_DATE" IS NULL)
7 - access("YT"."THE_DATE"=INTERNAL_FUNCTION("AD"."A_DATE"))
32 rows selected.
And my suggestion:
SQL> with all_dates_wo_boundary_values as
2 ( select oldest + level the_date
3 from ( select min(the_date) oldest
4 , max(the_date) recent
5 from your_table
6 )
7 connect by level <= recent - oldest - 1
8 )
9 select the_date
10 from all_dates_wo_boundary_values
11 minus
12 select the_date
13 from your_table
14 /
THE_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00
2 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7aavxmzkj7zq7, child number 0
-------------------------------------
with all_dates_wo_boundary_values as ( select oldest + level the_date from ( select min(the_date) oldest
, max(the_date) recent from your_table ) connect by level <= recent - oldest - 1 ) select
the_date from all_dates_wo_boundary_values minus select the_date from your_table
Plan hash value: 2293301832
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 1 | MINUS | | 1 | | 2 |00:00:00.01 | 6 | | | |
| 2 | SORT UNIQUE | | 1 | 1 | 5 |00:00:00.01 | 3 | 9216 | 9216 | 8192 (0)|
| 3 | VIEW | | 1 | 1 | 5 |00:00:00.01 | 3 | | | |
| 4 | CONNECT BY WITHOUT FILTERING| | 1 | | 5 |00:00:00.01 | 3 | | | |
| 5 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 6 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | | | |
| 8 | SORT UNIQUE | | 1 | 7 | 5 |00:00:00.01 | 3 | 9216 | 9216 | 8192 (0)|
| 9 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | | | |
-----------------------------------------------------------------------------------------------------------------------------------
22 rows selected.
Regards,
Rob.
source to share
You need a table Calendar
(persistent or created on the fly). Then you can do simple:
SELECT c.my_date
FROM
calendar c
JOIN
( SELECT MIN(date_column) AS min_date
, MAX(date_column) AS max_date
FROM tableX
) mm
ON c.mydate BETWEEN min_date AND max_date
WHERE
c.my_date NOT IN
( SELECT date_column
FROM tableX
)
source to share