"in" and "not in" counts don't add up - what's wrong?
I have multiple servers. Some of them are assigned ips. I want to find out how many are not. There are clearly more servers than the ips assigned, but my db tells me there are no servers that don't have ips assigned ...
I'm here. Is my DB corrupted in some strange way?
SELECT COUNT(*)
FROM server
... returns:
+----------+
| count(*) |
+----------+
| 23088 |
+----------+
1 row in set (0.00 sec)
It:
SELECT COUNT(*)
FROM server
WHERE server_id IN (SELECT DISTINCT(server_id)
FROM ips)
... returns:
+----------+
| count(*) |
+----------+
| 13811 |
+----------+
1 row in set (0.01 sec)
It:
SELECT COUNT(*)
FROM server
WHERE server_id NOT IN (SELECT DISTINCT(server_id)
FROM ips);
... returns:
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.02 sec)
The results have been edited to protect the culprit, but you got the idea.
- All tables are InnoDB.
-
Check table
returns ok on both of these tables.
EDIT: thanks for the suggestion to use LEFT JOIN
. This definitely confirms that the problem is with a MySQL bug.
mysql> SELECT count(s.server_id) FROM server s LEFT JOIN ips i on s.server_id = i.server_id WHERE i.server_id IS NULL;
+--------------------+
| count(s.server_id) |
+--------------------+
| 9277 |
+--------------------+
1 row in set (0.04 sec)
Since 9277 + 13811 = 23088, this means all servers without ips + all servers with ips are indeed == all servers.
I have scheduled my mysql server upgrade for early next week. Stay with us.
Assuming the error truppo found is causing this, you can use this workaround:
select count(*)
from server s
left join ips i on i.server_id = s.server_id
where i.server_id is null
Above is i.server_id is null
true if left join
no match is found (just as all columns from i
could give null
for this situation).
What version of MySQL? There seems to be a bug in <5.0.25 / 5.1.12 which could be the culprit:
Bug # 21282 : NOT IN, more than 1000 returns incorrect results using INDEX:
Using
SELECT ... WHERE some_field NOT IN (...)
and then 1000 or more values ββin NOT IN will partially cause the server to return incorrect results if it has an INDEX / UNIQUE key on the string field. Less than 1000 criteria are working correctly.
Do you have Nulls in your columns?
server_id not in (ids)
does not match columns NULL
, so you only get non-NULL servers server_id
that are not included ips
. You want to use instead where server_id is null
.
Do you have a record that has NULL for server_id? Because that would have been ruled out in both cases.
if you have NULLs in your columns, they will evaluate to false in both cases. the result you get is in
+ not in
-nulls
select count(*)
from server
where server_id not in (select distinct(server_id) from ips)
or server_id is NULL
I would guess that something strange is happening with IN and NOT IN. This may be a bug or a "known limitation".
I would suggest first trying to answer your original question (servers without ip) and then looking at the data. This may give you an indication of what might be going on.
So, here are some alternative ideas to give you what you are looking for:
SELECT server_id
FROM server
MINUS
SELECT server_id
FROM ips
or
SELECT server_id
FROM server s LEFT JOIN ips i on s.server_id = i.server_id
WHERE i.server_id is null
As stated above, this can give you an idea of ββwhy the data is not "caught" by your original statements.