Generate duplicate rows in Select statement in Oracle
I have a select statement with more than 10 columns. I have to repeat lines wherever data is missing based on the date. Rows to be generated must have data from previous rows sorted by date in ascending order. The date range to be considered is based on the grouping id.
The date is the range from March 15th to April 16th, but I only took limited strings for the sample.
For example, the data is shown below.
ID Date Type Code Location
==== ====== === ==== ====
1 15-Mar TG RET X1
1 17-Mar GG CAN S2
1 20-Mar DTR ISS D2
2 14-Apr YT RR F2
2 16-Apr F FC F1
Excluded output:
ID Date Type Code Location
=== ==== ==== ==== ======
1 15-Mar TG RET X1
*1 16-Mar TG RET X1*
1 17-Mar GG CAN S2
*1 18-Mar GG CAN S2*
*1 19-Mar GG CAN S2*
1 20-Mar DTR ISS D2
2 14-Apr YT RR F2
*2 15-Apr YT RR F2*
2 16-Apr F FC F1
source to share
Here is a working example of a possible way to achieve the desired result. I am using Oracle analytic function LAST_VALUE
with the IGNORE NULLS
and option ORDER BY
.
Test data:
CREATE TABLE so123 (
id NUMBER,
d DATE,
type VARCHAR2(10),
code VARCHAR2(10),
location VARCHAR2(10)
);
INSERT INTO so123 VALUES (1, DATE '2015-05-15', 'TG', 'RET', 'X1');
INSERT INTO so123 VALUES (1, DATE '2015-05-17', 'GG', 'CAN', 'S2');
INSERT INTO so123 VALUES (1, DATE '2015-05-20', 'DTR', 'ISS', 'D2');
INSERT INTO so123 VALUES (2, DATE '2015-04-14', 'YT', 'RR', 'F2');
INSERT INTO so123 VALUES (2, DATE '2015-04-16', 'F', 'FC', 'F1');
COMMIT;
Choose yourself:
WITH
dmm AS (
SELECT MIN(d) min_d, MAX(d) max_d FROM so123
)
SELECT
NVL(s.id, LAST_VALUE(s.id) IGNORE NULLS OVER (ORDER BY dt.d)) AS id,
dt.d,
NVL(s.type, LAST_VALUE(s.type) IGNORE NULLS OVER (ORDER BY dt.d)) AS type,
NVL(s.code, LAST_VALUE(s.code) IGNORE NULLS OVER (ORDER BY dt.d)) AS code,
NVL(s.location, LAST_VALUE(s.location) IGNORE NULLS OVER (ORDER BY dt.d)) AS location
FROM (
SELECT min_d + level - 1 as d
FROM dmm
CONNECT BY min_d + level - 1 <= max_d
) dt LEFT JOIN so123 s ON (dt.d = s.d)
ORDER BY dt.d
;
Output:
ID D TYPE CODE LOCATION
---------- ---------------- ---------- ---------- ----------
2 14-04-2015 00:00 YT RR F2
2 15-04-2015 00:00 YT RR F2
2 16-04-2015 00:00 F FC F1
2 17-04-2015 00:00 F FC F1
2 18-04-2015 00:00 F FC F1
2 19-04-2015 00:00 F FC F1
2 20-04-2015 00:00 F FC F1
2 21-04-2015 00:00 F FC F1
2 22-04-2015 00:00 F FC F1
2 23-04-2015 00:00 F FC F1
2 24-04-2015 00:00 F FC F1
2 25-04-2015 00:00 F FC F1
2 26-04-2015 00:00 F FC F1
2 27-04-2015 00:00 F FC F1
2 28-04-2015 00:00 F FC F1
2 29-04-2015 00:00 F FC F1
2 30-04-2015 00:00 F FC F1
2 01-05-2015 00:00 F FC F1
2 02-05-2015 00:00 F FC F1
2 03-05-2015 00:00 F FC F1
2 04-05-2015 00:00 F FC F1
2 05-05-2015 00:00 F FC F1
2 06-05-2015 00:00 F FC F1
2 07-05-2015 00:00 F FC F1
2 08-05-2015 00:00 F FC F1
2 09-05-2015 00:00 F FC F1
2 10-05-2015 00:00 F FC F1
2 11-05-2015 00:00 F FC F1
2 12-05-2015 00:00 F FC F1
2 13-05-2015 00:00 F FC F1
2 14-05-2015 00:00 F FC F1
1 15-05-2015 00:00 TG RET X1
1 16-05-2015 00:00 TG RET X1
1 17-05-2015 00:00 GG CAN S2
1 18-05-2015 00:00 GG CAN S2
1 19-05-2015 00:00 GG CAN S2
1 20-05-2015 00:00 DTR ISS D2
37 rows selected
How it works? We are generating all dates between MIN and MAX dates from the original table. To do this, we use a suggestion CONNECT BY
to force Oracle to generate records until the condition is min_d + level - 1 <= max_d
no longer met.
Then we take the generated records and the LEFT JOIN
original table. This is where the magic of the analytic function comes in LAST_VALUE
. This function searches the IGNORE NULLS
table for the last nonzero value (value ) using the specified ordering and filling in the missing fields.
You can learn more about this feature here:
http://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions.php
source to share
Fear not, you must when faced with a problem that can be optimally solved by Oracle's suggestion MODEL
. The following query will return the desired result:
Decision:
SELECT id, d, type, code, location
FROM (
SELECT
id, d, type, code, location,
null min_d,
null max_d
FROM t
UNION ALL
SELECT
id, null, null, null, null,
MIN(d),
MAX(d)
FROM t
GROUP BY id
)
MODEL RETURN UPDATED ROWS
PARTITION BY (id)
DIMENSION BY (d)
MEASURES (type, code, location, min_d, max_d)
RULES (
type [FOR d FROM min_d[null] TO max_d[null] INCREMENT INTERVAL '1' DAY] =
NVL(type[cv(d)], type[cv(d) - 1]),
code [FOR d FROM min_d[null] TO max_d[null] INCREMENT INTERVAL '1' DAY] =
NVL(code[cv(d)], code[cv(d) - 1]),
location[FOR d FROM min_d[null] TO max_d[null] INCREMENT INTERVAL '1' DAY] =
NVL(location[cv(d)], location[cv(d) - 1])
)
ORDER BY id, d
Result:
| ID | D | TYPE | CODE | LOCATION | |----|-------------------------|------|------|----------| | 1 | March, 15 2015 00:00:00 | TG | RET | X1 | | 1 | March, 16 2015 00:00:00 | TG | RET | X1 | | 1 | March, 17 2015 00:00:00 | GG | CAN | S2 | | 1 | March, 18 2015 00:00:00 | GG | CAN | S2 | | 1 | March, 19 2015 00:00:00 | GG | CAN | S2 | | 1 | March, 20 2015 00:00:00 | DTR | ISS | D2 | | 2 | April, 14 2015 00:00:00 | YT | RR | F2 | | 2 | April, 15 2015 00:00:00 | YT | RR | F2 | | 2 | April, 16 2015 00:00:00 | F | FC | F1 |
Explanation:
Think of a MODEL
SQL table language like Microsoft Excel, but much more powerful - because SQL!
SELECT id, d, type, code, location
FROM (
-- This is your original data, plus two columns
SELECT
id, d, type, code, location,
null min_d,
null max_d
FROM t
UNION ALL
-- This is a utility record containing the MIN(d) and MAX(d) values for
-- each ID partition. We'll use these MIN / MAX values to generate rows
SELECT
id, null, null, null, null,
MIN(d),
MAX(d)
FROM t
GROUP BY id
)
-- We're using the RETURN UPDATED ROWS clause, as we don't want the utility
-- record from above in the results
MODEL RETURN UPDATED ROWS
-- Your requirement is to fill gaps between dates within each id PARTITION
PARTITION BY (id)
-- The dates are your DIMENSION, i.e. the axis along which we're generating rows
DIMENSION BY (d)
-- The remaining rows are the MEASURES, i.e. the calculated values in each "cell"
MEASURES (type, code, location, min_d, max_d)
-- The following RULES are used to generate rows. For each MEASURE, we simply
-- iterate from the MIN(d) to the MAX(d) value, referencing the min_d / max_d
-- values from the utility record above
RULES (
type [FOR d FROM min_d[null] TO max_d[null] INCREMENT INTERVAL '1' DAY] =
NVL(type[cv(d)], type[cv(d) - 1]),
code [FOR d FROM min_d[null] TO max_d[null] INCREMENT INTERVAL '1' DAY] =
NVL(code[cv(d)], code[cv(d) - 1]),
location[FOR d FROM min_d[null] TO max_d[null] INCREMENT INTERVAL '1' DAY] =
NVL(location[cv(d)], location[cv(d) - 1])
)
ORDER BY id, d
source to share