Optimizing in a poorly performing query
I have the following query that works correctly, however it works very badly. I suspect my problem is related to two comparison conditions in the INNER JOIN statement. Both fields have an index, but the MySQL query optimizer seems to ignore them. Here is my request:
EDIT: Changed the query to use the below suggested Gordon as it keeps the same results but is faster. The EXPLAIN statement is still not satisfied and the result is shown below.
SELECT a.id
FROM pc a INNER JOIN
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc FORCE INDEX (IDX_SEENDATE)
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) b
ON a.correction_value = b.correction_value AND
a.seenDate = b.mxdate INNER JOIN
cameras c
ON c.camera_id = a.camerauid
WHERE c.in_out = 0;
EXPLAIN
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2414394 | 100 | Using where; |
| | | | | | | | | | | | Using temporary; |
| | | | | | | | | | | | Using filesort |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 1 | PRIMARY | a | NULL | ref | correction_value, | idx_seenDate | 5 | b.mxdate | 1 | 3.8 | Using where |
| | | | | | idx_seenDate, | | | | | | |
| | | | | | fk_camera_idx | | | | | | |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 1 | PRIMARY | c | NULL | ALL | PRIMARY | NULL | NULL | NULL | 41 | 2.44 | Using where; |
| | | | | | | | | | | | Using join buffer (Block Nested Loop) |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 2 | DERIVED | pc | NULL | range | correction_value, | idx_seenDate | 5 | NULL | 2414394 | 100 | Using index Condition; |
| | | | | | idx_seenDate | | | | | | Using temporary; |
| | | | | | | | | | | | Using filesort |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
How can I optimize a query but still have the same result?
source to share
Let's start by focusing on the subquery.
SELECT correction_value,
MAX(seenDate) mxdate
FROM pc
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
Run this twice,
INDEX sc (seenDate, correction_value)
INDEX cs (correction_value, seenDate)
Please REFER one index and then another. Depending on which version of MySQL you are using, one of the indexes will perform better than the other.
I think later versions will prefer "cs" because it can jump over the index very quickly.
Once you've figured out which composite index to use, then drop the FORCE
unused index as well, then try the whole query. The same index should do the combined query just fine.
Since your task seems to include a "groupwise maximum", I suggest you see if there are any performance guidelines here: http://mysql.rjweb.org/doc.php/groupwise_max
source to share
try it
SELECT
a.id
FROM pc a
INNER JOIN
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc
INNER JOIN cameras ON (cameras.camera_id = pc.camerauid AND cameras.in_out = 0)
WHERE pc.seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value) b ON (a.correction_value = b.correction_value AND a.seenDate = b.mxdate);
use index on pc.seenDate column.
source to share
I would start by writing a request like:
SELECT a.id
FROM pc a INNER JOIN
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) b
ON a.correction_value = b.correction_value AND
a.seenDate = b.mxdate INNER JOIN
cameras c
ON c.camera_id = a.camerauid
WHERE c.in_out = 0; - don't use single quotes if `in_out` is a number
The place to start this query is to have indices: pc(seendate, correction_value, seendate)
and cameras(camera_id, in_out)
.
There may also be ways to rewrite the request if this is not enough.
source to share
It is not clear from your question how tables are indexed, but in this subquery
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc FORCE INDEX (IDX_SEENDATE)
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) b
you want to have a composite index on both fields seenDate, correction_value
:
CREATE INDEX seenCorr_ndx ON pc (seenDate, correction_value);
(you can only drop any index on seenDate
, and I expect you won't need FORCE INDEX either).
You may need two composite indexes, first with seenDate
, first with correction_value
.
source to share
The RDBMS uses the output of the first query as input for the next query. So if we look at the derived query, it uses a filter, so we can use it as the first query , then attach to pc , then attach to camera .
Indexes: mentioned by Gordon Linof or pc (id, correction_value, seendate) and cameras (camera_id, in_out)
The final query can be rewritten as follows:
SELECT a.id
--add any other column here, you want to show in the EXPLAINED output
FROM
(
SELECT id, correction_value, MAX(seenDate) mxdate
FROM pc
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) a
INNER JOIN pc b
ON a.correction_value = b.correction_value
AND a.seenDate = b.mxdate
INNER JOIN cameras c
ON c.camera_id = a.camerauid
WHERE c.in_out = 0;
source to share