Why aren't indexes speeding up this query?

I have two tables users

and posts

, each containing 500k records.

I want to find users who have posted between 100 and 200 posts.

My request:

SELECT u.accountid, COUNT(*)
FROM users u
JOIN posts p
ON u.accountid = p.owneruserid
GROUP BY u.accountid
HAVING COUNT(*) BETWEEN 100 AND 200;

      

And I get a response in about a second.

I added indexes to fields accountid

and owneruserid

tables users

and posts

accordingly, but the query didn't speed up. What for?

+3


source to share


2 answers


HAVING COUNT(*) BETWEEN 100 AND 200;

      

This part is the key to explaining why indexes are useless.

We only need to get groups that have between 100 and 200 members. This means that for each group, we need an accurate member count. The second point we have no restrictions (for example, the WHERE clause), so to count and all groups, we need to go through all the records in the table.



Indexes, for example. The B-Tree Index helps you find the correct item (row) based on the index condition. If the data is ordered in some way (the index provides the order), we can use binary search to find the subset we want. But in our case, we need to scan all records. So it doesn't matter if they ordered or not.

This is why the index does not speed up the query.

+3


source


You can simplify your query:

SELECT p.owneruserid, COUNT(*)
FROM posts p
GROUP BY p.owneruserid
HAVING COUNT(*) BETWEEN 100 AND 200;

      



The index posts(owneruserid)

shouldn't work for this request. This is the coverage index for the query, so the query might be faster.

Overall, the query seems to require scanning all of the data in posts

order to aggregate. HAVING

cannot use index. However, the query can use the coverage index to reduce I / O.

+1


source







All Articles