SQL Oracle Query: can this query be optimized?

I want to select the most recent records in a database table that contains multiple rows with the same product numbers (but a different date). In this particular case, I want to filter a table TKP_NOISE_MOTOR_RESULTS

PRODUCT_NUMBER

that occurs at least five times in the table, and from this subset, I want to get rows with the most recent number SAVEDATE

for each prdocut

I managed to create such a query, but with three routines T1, T2, T3. I have a feeling. This can only be done with two tables and no inner join is required. But it took many hours to do this because it was difficult to translate the script from MySQL to Oracle.

Can the next query be optimized to require fewer subqueries?

SELECT * FROM
messfeld.TKP_NOISE_MOTOR_RESULTS T1
JOIN
(
  SELECT PRODUCT_NUMBER, COUNT(*)
  FROM messfeld.TKP_NOISE_MOTOR_RESULTS 
  GROUP BY PRODUCT_NUMBER
  HAVING COUNT(*)>5
) T2
ON T1.PRODUCT_NUMBER=T2.PRODUCT_NUMBER
WHERE T1.SAVEDATE BETWEEN '27-AUG-14' AND '28-AUG-14' AND
(T1.SAVEDATE, T1.PRODUCT_NUMBER) IN 
(
SELECT MAX(T3.SAVEDATE), T3.PRODUCT_NUMBER 
FROM messfeld.TKP_NOISE_MOTOR_RESULTS T3
WHERE
T2.PRODUCT_NUMBER=T3.PRODUCT_NUMBER
GROUP BY PRODUCT_NUMBER
);

      

+3


source to share


3 answers


If you only want the product number and date, you can simplify the current approach a little; since you suspected that you have collided with the table again than you need and can replace the joins with the suggestion IN

:

SELECT PRODUCT_NUMBER, MAX(SAVEDATE)
FROM TKP_NOISE_MOTOR_RESULTS
WHERE PRODUCT_NUMBER IN (
  SELECT PRODUCT_NUMBER
  FROM TKP_NOISE_MOTOR_RESULTS 
  GROUP BY PRODUCT_NUMBER
  HAVING COUNT(*)>5
)
GROUP BY PRODUCT_NUMBER;

      

But if you have other columns, you need to make it more complex again.

You can use analytic functions to avoid hitting the table multiple times or any connections:

SELECT PRODUCT_NUMBER, SAVEDATE --, other columns
FROM (
  SELECT T.*,
    ROW_NUMBER() OVER (PARTITION BY T.PRODUCT_NUMBER
      ORDER BY T.SAVEDATE DESC) AS RN,
    COUNT(*) OVER (PARTITION BY T.PRODUCT_NUMBER) AS CNT
  FROM TKP_NOISE_MOTOR_RESULTS T
  WHERE T.SAVEDATE BETWEEN DATE '2014-08-27' AND DATE '2014-08-28'
)
WHERE CNT > 5
AND RN = 1;

      

The inner query gets all columns from the base table and adds pseudo-columns based on analytic functions. ROW_NUMBER()

assigns a value to each row for a specific product, with the most recent date as number 1 (through ORDER BY ... DESC

). You can also consider RANK()

or DENSE_RANK()

, especially if you may have links and want to show all the lines when a link occurs. COUNT(*)

counts lines for each product.



The outer query then filters to only have products greater than five; and also get only the first line, which is the very last.

SQL Fiddle with original query and this one for the same data.

I also switched to using date literatures; you should at least use TO_DATE

with an explicit format mask and not rely on NLS session settings. Also note the thet BETWEEN

on inclusion, so this (and your original) will pick up midnight on the 28th; you can use:

  WHERE T.SAVEDATE >= DATE '2014-08-27'
  AND T.SAVEDATE < DATE '2014-08-28'

      

.. or if you are trying to include all entries from both days, then < DATE '2014-08-29'

. I am assuming they have times, otherwise five entries for the same date will look the same and you need a different way to decide which is the last one.

+2


source


If you are using a fairly old version of Oracle, you can use the analytic form COUNT()

and ROW_NUMBER()

to achieve the desired result. Try the following:

SELECT
      *
FROM (
      SELECT
            TNMR.*
          , COUNT(*) OVER (PARTITION BY TNMR.PRODUCT_NUMBER) AS CN
          , ROW_NUMBER() OVER (PARTITION BY TNMR.PRODUCT_NUMBER
                               ORDER BY TNMR.SAVEDATE DESC) AS RN
      FROM messfeld.TKP_NOISE_MOTOR_RESULTS TNMR
      ) T1
WHERE T1.CN >= 5 AND T1.RN = 1
AND T1.SAVEDATE BETWEEN '27-AUG-14' AND '28-AUG-14'
;

      

However, I really would not recommend dd-mmm-yy as date literals, and I never used BETWEEN

for date ranges and would use this instead:



AND T1.SAVEDATE >= to_date('27-08-2014','dd-mmm-yyyy') 
AND T1.SAVEDATE < to_date('28-08-2014','dd-mmm-yyyy') + 1 -- 1 day added

      


FOOTNOTE "select *" is a convenience only, it is used above only for shortening and / or because details are unknown. Please fully indicate the offer of choice.

+2


source


You seem to be looking for all the products for which

  • there are more than five entries
  • there is at least one entry on August 27 or 28.

From these, you want to take the newest record found within that date range.

So, select all products with more than 5 records (as you already did) and define the most recent save date in the date range using the case construct.

select *
from messfeld.tkp_noise_motor_results 
where (product_number, savedate) in
(
  select
    product_number, 
    max(case when to_char(savedate, 'dd-mm-yyyy') in ('27-08-2014', '28-08-2014') then savedate end)
  from messfeld.tkp_noise_motor_results 
  group by product_number
  having count(*) > 5 
  -- the next line is not really needed. Use it if you find it more readable
  and max(case when to_char(savedate, 'dd-mm-yyyy') in ('27-08-2014', '28-08-2014') then savedate end) is not null
);

      

+1


source







All Articles