Why aren't indexes speeding up this query?
I have two tables users
and posts
, each containing 500k records.
I want to find users who have posted between 100 and 200 posts.
My request:
SELECT u.accountid, COUNT(*)
FROM users u
JOIN posts p
ON u.accountid = p.owneruserid
GROUP BY u.accountid
HAVING COUNT(*) BETWEEN 100 AND 200;
And I get a response in about a second.
I added indexes to fields accountid
and owneruserid
tables users
and posts
accordingly, but the query didn't speed up. What for?
source to share
HAVING COUNT(*) BETWEEN 100 AND 200;
This part is the key to explaining why indexes are useless.
We only need to get groups that have between 100 and 200 members. This means that for each group, we need an accurate member count. The second point we have no restrictions (for example, the WHERE clause), so to count and all groups, we need to go through all the records in the table.
Indexes, for example. The B-Tree Index helps you find the correct item (row) based on the index condition. If the data is ordered in some way (the index provides the order), we can use binary search to find the subset we want. But in our case, we need to scan all records. So it doesn't matter if they ordered or not.
This is why the index does not speed up the query.
source to share
You can simplify your query:
SELECT p.owneruserid, COUNT(*)
FROM posts p
GROUP BY p.owneruserid
HAVING COUNT(*) BETWEEN 100 AND 200;
The index posts(owneruserid)
shouldn't work for this request. This is the coverage index for the query, so the query might be faster.
Overall, the query seems to require scanning all of the data in posts
order to aggregate. HAVING
cannot use index. However, the query can use the coverage index to reduce I / O.
source to share