How does this mySQL query work?

I am trying to understand how mysql queries work with and without GROUP BY.

Imagine I have a FILM_ACTORS table where each ACTOR_ID has a corresponding FILM_ID. Thus, the same actor participates in N different films.

I want to select actors who participate in 20 films:

SELECT actor_id FROM film_actor GROUP BY actor_id HAVING COUNT(film_id) = 20;

      

This query works and returns actor_ids that are featured in 20 movies. But what if I just did:

SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 20;

      

Why does this query only return values ​​if I compare it to the SIZE of the film_actor table (5463):

SELECT actor_id FROM film_actor HAVING COUNT(film_id) = 5463;

      

In this case, it returns actor_id = 1 to me. Why? Does film_ids pick without considering the corresponding actor_ids?

+3


source to share


3 answers


GROUP BY groups results by the values ​​of the following columns, commonly used with aggregate functions (for example, COUNT).

So your first query returns one row for each actor_id value and HAVING limits the results to those where the counter is 20



Without a GROUP BY clause, the aggregate function acts on all rows. So your second query is picking actor_id where the number of movies is 20 but without grouping the counter is 5463 (i.e. Number of rows in the table). The Actor_id returned in this situation is undefined (i.e., Can be any of these).

+5


source


In the second request, no GROUP BY

. Using an aggregate function COUNT

in a clause HAVING

means that the query will return at most one row.

Compare with this query:

SELECT actor_id, COUNT(film_id) FROM film_actor

      

Returns one string, for example

actor_id  COUNT(film_id)
--------  --------------
      42            5463

      

(NOTE: By default, MySQL will return a result for this query. Other databases will reject this query and raise an error like "non-aggregation not in group by". The problem is that the link to actor_id

in SELECT

. For this query to work in other databases, we would have to be removed actor_id

from the SELECT list. We can make MySQL behave the same if we set sql_mode

to enable ONLY_FULL_GROUP_BY

.)



Note that the value returned for actor_id

is the value from "some string". It is not deterministic from which row this value is returned, it can be from any row. The value returned for COUNT

refers to the entire table.


If you want COUNT

for each actor, you need a proposal GROUP BY

like in the first request.

    SELECT actor_id, COUNT(film_id) FROM film_actor GROUP BY actor_id

      

Starting with this request as a basis, you can add a sentence HAVING

. And you can also remove COUNT(film_id)

from the list SELECT

. But you cannot delete GROUP BY

without affecting what is returned for COUNT(film_id)

.

+4


source


So let's say you had:

+---------------------------------+
| actor_id | actor_name | film_id |
+---------------------------------+
|        4 |       John |       3 |
|        4 |       John |       4 |
|        5 |       Alex |       3 |
+---------------------------------+

      

At startup:

SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id;

      

We would get:

+------------------+
| actor_id | Films |
+------------------+
|        4 |     2 |
|        5 |     1 |
+------------------+

      

So we can do:

SELECT actor_id, COUNT(film_id) AS Films FROM film_actor GROUP BY actor_id WHERE Films = 2;

      

This should just return actor_id of 4.

+3


source







All Articles