Speed-up: Query with GROUP BY

I use the following query to select the participants with the minimum film age.

SELECT production_cast.production_id, MIN(birthdate) FROM person
LEFT JOIN production_cast ON production_cast.person_id = person.id
WHERE birthdate IS NOT NULL
GROUP BY production_cast.production_id;

      

However, the IMDB dataset is very large and it takes over 300 seconds to complete. Without GROUP BY and MIN, this query works after 0.2 seconds:

SELECT production_cast.production_id FROM person
LEFT JOIN production_cast ON production_cast.person_id = person.id
WHERE birthdate IS NOT NULL;

      

The core of the database is MyIsam. Mysql version is 5.7.2. I tried using these BTREE indexes on:

  • production_cast.production_id
  • person.birthdate
  • person.birthdate and person.id
  • production_cast.id and production_cast.production_id

Brief explanation Description: Face: , Indices: idx_Person_id_birthdate, idx_Person_id_birthdate, Extras: . Index usage; Use of temporary; Using filesort

Production_cast: ref, Indexes: idx_Production_cast_person_id_production_id Additionally: Index usage

person.id and production_cast.id are primary key indexes. production_cast.production_id is not the primary key, but it has an index. What can I do to increase the speed of this search query.

+3


source to share


2 answers


You can add some indexes to speed up data retrieval.

On production_cast

:

  • person_id

  • id

On person

:



  • id

  • birthdate

Thus, the database does not need to fetch all of the data, only the data from the index. In addition, the order of the index columns will speed up searches. You must also add the column alias person.birth_date

to the parsing time:

SELECT pc.id
,      MIN(p.birthdate)
FROM   person p
LEFT 
JOIN   production_cast pc
ON     pc.person_id = p.id
WHERE  p.birthdate IS NOT NULL
GROUP
BY     pc.id;

      

+1


source


It's too long for a comment.

First, LEFT JOIN

not required, unless you care about a "person" who is not in any production process. It seems unlikely. So your request:

SELECT p.id, MIN(birthdate)
FROM person p JOIN
     production_cast pc
     ON pc.person_id = p.id
WHERE p.birthdate IS NOT NULL
GROUP BY pc.id;

      



Second, if production_cast.id

is the primary key, but person.id

is the primary key, then the query cannot create duplicate values ​​for the given one production_cast.id

. Therefore, group by

it is not required:

SELECT p.id, p.birthdate
FROM person p JOIN
     production_cast pc
     ON pc.person_id = p.id
WHERE p.birthdate IS NOT NULL;

      

I suspect you have a different table or different aggregation key in production_cast

, but your query is not doing what you think it should.

+1


source







All Articles