MySQL performance using IN predicate
If I run the following queries, each one returns quickly (0.01 sec) and gives me the desired output.
SELECT tagId FROM tag WHERE name='programming'
SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId=123 OR tagId=124
(let's assume the two tagId values ββwere the result of the first query)
I would like to combine these requests, so I only need to run it once:
SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (SELECT tagId FROM tag WHERE name='programming')
However, this request completes in about 1 minute and 20 seconds. I have indices worktag.workcode
, worktag.tagId
, tag.tagId
and tag.name
.
If I run DESCRIBE
in queries, the first two use indexes and the second uses the index for the subquery (on the table tag
), but does not use the indexes on the table worktag
.
Does anyone know why this might be?
NOTE. The table worktag
contains over 18 million records.
source to share
Why aren't you using a join instead of a subquery?
SELECT COUNT(DISTINCT workcode)
FROM worktag
LEFT JOIN tag
ON worktag.tagId = tag.tagID
WHERE tag.name = 'programming'
PS: Seems to be reported as a bug .
source to share
MySQL usually doesn't do that well with subqueries, even independent ones. The posters that discussed joining are right - if you have a choice, use join. If you cannot easily use a join (e.g. foo.x in (select y from bar where y = xxx limit 10)), you are better off using the limit on the IN MEMORY temporary table and use the join on it.
If you use MySQL a lot use EXPLAIN and you will see how it uses your indexes and things like that.
source to share
You tried:
SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (123, 124)
?
I'm not a MySQL expert, but it seems to me that you can look at a significant glitch in the query optimizer.
On the other hand, it's good for MySQL that it optimizes the OR in the second expression. I know of databases that will successfully optimize IN (), but not the OR version of the same boolean query.
source to share