How does this mySQL query work?
I am trying to understand how mysql queries work with and without GROUP BY.
Imagine I have a FILM_ACTORS table where each ACTOR_ID has a corresponding FILM_ID. Thus, the same actor participates in N different films.
I want to select actors who participate in 20 films:
SELECT actor_id FROM film_actor GROUP BY actor_id HAVING COUNT(film_id) = 20;
This query works and returns actor_ids that are featured in 20 movies. But what if I just did:
SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 20;
Why does this query only return values ββif I compare it to the SIZE of the film_actor table (5463):
SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 5463;
In this case, it returns actor_id = 1 to me. Why? Does film_ids pick without considering the corresponding actor_ids?
source to share
GROUP BY groups results by the values ββof the following columns, commonly used with aggregate functions (for example, COUNT).
So your first query returns one row for each actor_id value and HAVING limits the results to those where the counter is 20
Without a GROUP BY clause, the aggregate function acts on all rows. So your second query is picking actor_id where the number of movies is 20 but without grouping the counter is 5463 (i.e. Number of rows in the table). The Actor_id returned in this situation is undefined (i.e., Can be any of these).
source to share
In the second request, no GROUP BY
. Using an aggregate function COUNT
in a clause HAVING
means that the query will return at most one row.
Compare with this query:
SELECT actor_id, COUNT(film_id) FROM film_actor
Returns one string, for example
actor_id COUNT(film_id)
-------- --------------
42 5463
(NOTE: By default, MySQL will return a result for this query. Other databases will reject this query and raise an error like "non-aggregation not in group by". The problem is that the link to actor_id
in SELECT
. For this query to work in other databases, we would have to be removed actor_id
from the SELECT list. We can make MySQL behave the same if we set sql_mode
to enable ONLY_FULL_GROUP_BY
.)
Note that the value returned for actor_id
is the value from "some string". It is not deterministic from which row this value is returned, it can be from any row. The value returned for COUNT
refers to the entire table.
If you want COUNT
for each actor, you need a proposal GROUP BY
like in the first request.
SELECT actor_id, COUNT(film_id) FROM film_actor GROUP BY actor_id
Starting with this request as a basis, you can add a sentence HAVING
. And you can also remove COUNT(film_id)
from the list SELECT
. But you cannot delete GROUP BY
without affecting what is returned for COUNT(film_id)
.
source to share
So let's say you had:
+---------------------------------+
| actor_id | actor_name | film_id |
+---------------------------------+
| 4 | John | 3 |
| 4 | John | 4 |
| 5 | Alex | 3 |
+---------------------------------+
At startup:
SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id;
We would get:
+------------------+
| actor_id | Films |
+------------------+
| 4 | 2 |
| 5 | 1 |
+------------------+
So we can do:
SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id WHERE Films = 2;
This should just return actor_id of 4.
source to share