SQL Oracle Query: can this query be optimized?
I want to select the most recent records in a database table that contains multiple rows with the same product numbers (but a different date). In this particular case, I want to filter a table TKP_NOISE_MOTOR_RESULTS
PRODUCT_NUMBER
that occurs at least five times in the table, and from this subset, I want to get rows with the most recent number SAVEDATE
for each prdocut
I managed to create such a query, but with three routines T1, T2, T3. I have a feeling. This can only be done with two tables and no inner join is required. But it took many hours to do this because it was difficult to translate the script from MySQL to Oracle.
Can the next query be optimized to require fewer subqueries?
SELECT * FROM
messfeld.TKP_NOISE_MOTOR_RESULTS T1
JOIN
(
SELECT PRODUCT_NUMBER, COUNT(*)
FROM messfeld.TKP_NOISE_MOTOR_RESULTS
GROUP BY PRODUCT_NUMBER
HAVING COUNT(*)>5
) T2
ON T1.PRODUCT_NUMBER=T2.PRODUCT_NUMBER
WHERE T1.SAVEDATE BETWEEN '27-AUG-14' AND '28-AUG-14' AND
(T1.SAVEDATE, T1.PRODUCT_NUMBER) IN
(
SELECT MAX(T3.SAVEDATE), T3.PRODUCT_NUMBER
FROM messfeld.TKP_NOISE_MOTOR_RESULTS T3
WHERE
T2.PRODUCT_NUMBER=T3.PRODUCT_NUMBER
GROUP BY PRODUCT_NUMBER
);
If you only want the product number and date, you can simplify the current approach a little; since you suspected that you have collided with the table again than you need and can replace the joins with the suggestion IN
:
SELECT PRODUCT_NUMBER, MAX(SAVEDATE)
FROM TKP_NOISE_MOTOR_RESULTS
WHERE PRODUCT_NUMBER IN (
SELECT PRODUCT_NUMBER
FROM TKP_NOISE_MOTOR_RESULTS
GROUP BY PRODUCT_NUMBER
HAVING COUNT(*)>5
)
GROUP BY PRODUCT_NUMBER;
But if you have other columns, you need to make it more complex again.
You can use analytic functions to avoid hitting the table multiple times or any connections:
SELECT PRODUCT_NUMBER, SAVEDATE --, other columns
FROM (
SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.PRODUCT_NUMBER
ORDER BY T.SAVEDATE DESC) AS RN,
COUNT(*) OVER (PARTITION BY T.PRODUCT_NUMBER) AS CNT
FROM TKP_NOISE_MOTOR_RESULTS T
WHERE T.SAVEDATE BETWEEN DATE '2014-08-27' AND DATE '2014-08-28'
)
WHERE CNT > 5
AND RN = 1;
The inner query gets all columns from the base table and adds pseudo-columns based on analytic functions. ROW_NUMBER()
assigns a value to each row for a specific product, with the most recent date as number 1 (through ORDER BY ... DESC
). You can also consider RANK()
or DENSE_RANK()
, especially if you may have links and want to show all the lines when a link occurs. COUNT(*)
counts lines for each product.
The outer query then filters to only have products greater than five; and also get only the first line, which is the very last.
SQL Fiddle with original query and this one for the same data.
I also switched to using date literatures; you should at least use TO_DATE
with an explicit format mask and not rely on NLS session settings. Also note the thet BETWEEN
on inclusion, so this (and your original) will pick up midnight on the 28th; you can use:
WHERE T.SAVEDATE >= DATE '2014-08-27'
AND T.SAVEDATE < DATE '2014-08-28'
.. or if you are trying to include all entries from both days, then < DATE '2014-08-29'
. I am assuming they have times, otherwise five entries for the same date will look the same and you need a different way to decide which is the last one.
source to share
If you are using a fairly old version of Oracle, you can use the analytic form COUNT()
and ROW_NUMBER()
to achieve the desired result. Try the following:
SELECT
*
FROM (
SELECT
TNMR.*
, COUNT(*) OVER (PARTITION BY TNMR.PRODUCT_NUMBER) AS CN
, ROW_NUMBER() OVER (PARTITION BY TNMR.PRODUCT_NUMBER
ORDER BY TNMR.SAVEDATE DESC) AS RN
FROM messfeld.TKP_NOISE_MOTOR_RESULTS TNMR
) T1
WHERE T1.CN >= 5 AND T1.RN = 1
AND T1.SAVEDATE BETWEEN '27-AUG-14' AND '28-AUG-14'
;
However, I really would not recommend dd-mmm-yy as date literals, and I never used BETWEEN
for date ranges and would use this instead:
AND T1.SAVEDATE >= to_date('27-08-2014','dd-mmm-yyyy')
AND T1.SAVEDATE < to_date('28-08-2014','dd-mmm-yyyy') + 1 -- 1 day added
FOOTNOTE "select *" is a convenience only, it is used above only for shortening and / or because details are unknown. Please fully indicate the offer of choice.
source to share
You seem to be looking for all the products for which
- there are more than five entries
- there is at least one entry on August 27 or 28.
From these, you want to take the newest record found within that date range.
So, select all products with more than 5 records (as you already did) and define the most recent save date in the date range using the case construct.
select *
from messfeld.tkp_noise_motor_results
where (product_number, savedate) in
(
select
product_number,
max(case when to_char(savedate, 'dd-mm-yyyy') in ('27-08-2014', '28-08-2014') then savedate end)
from messfeld.tkp_noise_motor_results
group by product_number
having count(*) > 5
-- the next line is not really needed. Use it if you find it more readable
and max(case when to_char(savedate, 'dd-mm-yyyy') in ('27-08-2014', '28-08-2014') then savedate end) is not null
);
source to share