Speed-up: Query with GROUP BY
I use the following query to select the participants with the minimum film age.
SELECT production_cast.production_id, MIN(birthdate) FROM person
LEFT JOIN production_cast ON production_cast.person_id = person.id
WHERE birthdate IS NOT NULL
GROUP BY production_cast.production_id;
However, the IMDB dataset is very large and it takes over 300 seconds to complete. Without GROUP BY and MIN, this query works after 0.2 seconds:
SELECT production_cast.production_id FROM person
LEFT JOIN production_cast ON production_cast.person_id = person.id
WHERE birthdate IS NOT NULL;
The core of the database is MyIsam. Mysql version is 5.7.2. I tried using these BTREE indexes on:
- production_cast.production_id
- person.birthdate
- person.birthdate and person.id
- production_cast.id and production_cast.production_id
Brief explanation Description: Face: , Indices: idx_Person_id_birthdate, idx_Person_id_birthdate, Extras: . Index usage; Use of temporary; Using filesort
Production_cast: ref, Indexes: idx_Production_cast_person_id_production_id Additionally: Index usage
person.id and production_cast.id are primary key indexes. production_cast.production_id is not the primary key, but it has an index. What can I do to increase the speed of this search query.
source to share
You can add some indexes to speed up data retrieval.
On production_cast
:
-
person_id
-
id
On person
:
-
id
-
birthdate
Thus, the database does not need to fetch all of the data, only the data from the index. In addition, the order of the index columns will speed up searches. You must also add the column alias person.birth_date
to the parsing time:
SELECT pc.id
, MIN(p.birthdate)
FROM person p
LEFT
JOIN production_cast pc
ON pc.person_id = p.id
WHERE p.birthdate IS NOT NULL
GROUP
BY pc.id;
source to share
It's too long for a comment.
First, LEFT JOIN
not required, unless you care about a "person" who is not in any production process. It seems unlikely. So your request:
SELECT p.id, MIN(birthdate)
FROM person p JOIN
production_cast pc
ON pc.person_id = p.id
WHERE p.birthdate IS NOT NULL
GROUP BY pc.id;
Second, if production_cast.id
is the primary key, but person.id
is the primary key, then the query cannot create duplicate values for the given one production_cast.id
. Therefore, group by
it is not required:
SELECT p.id, p.birthdate
FROM person p JOIN
production_cast pc
ON pc.person_id = p.id
WHERE p.birthdate IS NOT NULL;
I suspect you have a different table or different aggregation key in production_cast
, but your query is not doing what you think it should.
source to share