PostgreSQL Error - Correlated Subquery?
I have a request like this:
SELECT t1.id,
(SELECT COUNT(t2.id)
FROM t2
WHERE t2.id = t1.id
) as num_things
FROM t1
WHERE num_things = 5;
The goal is to get the ID of all items that appear 5 times in another table. However, I am getting this error:
ERROR: column "num_things" does not exist
SQL state: 42703
I am probably doing something silly here as I am somewhat new to databases. Is there a way to fix this request so that I can access it num_things
? Or, if not, is there any other way to achieve this result?
source to share
Several important questions about using SQL:
- You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. This is the reason for the error you received.
- You can make your score better using JOIN and GROUP BY than using correlated subqueries. It will be much faster.
- Use the HAVING clause to filter groups.
This is how I will write this query:
SELECT t1.id, COUNT(t2.id) AS num_things
FROM t1 JOIN t2 USING (id)
GROUP BY t1.id
HAVING num_things = 5;
I understand that this request may be missing JOIN
from t1 as in Charles Bretana's solution. But I am assuming that you want the query to include some other columns from t1.
Re: question in comment:
The difference is that the proposal WHERE
is evaluated row by row, before it GROUP BY
reduces the groups to one row per group. The proposal HAVING
is evaluated after the formation of groups. Thus, you cannot, for example, change COUNT()
groups using HAVING
; you can only exclude a group.
SELECT t1.id, COUNT(t2.id) as num
FROM t1 JOIN t2 USING (id)
WHERE t2.attribute = <value>
GROUP BY t1.id
HAVING num > 5;
In the above query, WHERE
filters are for rows that match a condition and filters are HAVING
for groups that have at least five counters.
What causes most people's confusion lies in the fact that they do not offer GROUP BY
, so it seems that HAVING
and WHERE
are used interchangeably.
WHERE
evaluated before expressions in the picklist. It might not be obvious because the SQL syntax puts the select list first. Thus, you can save a lot of expensive computation by using WHERE
string constraints.
SELECT <expensive expressions>
FROM t1
HAVING primaryKey = 1234;
If you use a query like this, the expressions in the select list are evaluated for each row, only to discard most of the results due to the condition HAVING
. However, the query below evaluates the expression for only one row that matches the condition WHERE
.
SELECT <expensive expressions>
FROM t1
WHERE primaryKey = 1234;
So, to recap, queries are triggered by the database engine following a series of steps:
- Generate a rowset from the table (s), including any rows created
JOIN
. - Evaluate
WHERE
conditions against a set of rows, filtering out rows that do not match. - Evaluate the expressions in the picklist for each of the rowset.
- Apply column aliases (note that this is a separate step, which means you cannot use aliases in expressions in the select list).
- Condensed groups in one line per group as per proposal
GROUP BY
. - Calculate conditions
HAVING
for groups, filtering out groups that do not match. - Sorting the result according to
ORDER BY
.
source to share
I would like to mention that there is no way in PostgreSQL to use aliased columns in a sentence.
i.e.
SELECT usr_id AS my_id FROM user HAVING my_id = 1
Does not work.
Another example that won't work:
SELECT su.usr_id AS my_id, COUNT (*) AS val FROM sys_user AS su GROUP BY su.usr_id HAVING val> = 1
There will be the same error: the val column is unknown.
I elevated this because Bill Carwin wrote something really wrong for Postgres:
"You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. This is the reason for the error you received.