MySQL performance using IN predicate

If I run the following queries, each one returns quickly (0.01 sec) and gives me the desired output.

SELECT tagId FROM tag WHERE name='programming'

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId=123 OR tagId=124

(let's assume the two tagId values ​​were the result of the first query)

I would like to combine these requests, so I only need to run it once:

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (SELECT tagId FROM tag WHERE name='programming')

However, this request completes in about 1 minute and 20 seconds. I have indices worktag.workcode

, worktag.tagId

, tag.tagId

and tag.name

.

If I run DESCRIBE

in queries, the first two use indexes and the second uses the index for the subquery (on the table tag

), but does not use the indexes on the table worktag

.

Does anyone know why this might be?

NOTE. The table worktag

contains over 18 million records.

+2


source to share


6 answers


Why aren't you using a join instead of a subquery?

SELECT COUNT(DISTINCT workcode)
FROM worktag
LEFT JOIN tag
  ON worktag.tagId = tag.tagID
WHERE tag.name = 'programming'

      



PS: Seems to be reported as a bug .

+2


source


Recently the DBA informed me that the syntax WHERE x IN ( ... )

is a pain for the database. The connection is almost always better:



SELECT COUNT(DISTINCT wt.workcode) 
  FROM worktag wt, tag t 
 WHERE wt.tagId = t.tagId 
   AND t.name='programming'

      

+1


source


SELECT COUNT(DISTINCT workcode) 
FROM worktag 
inner join tag on worktag.tagid = tag.tagid
WHERE tag.name='programming'

      

+1


source


MySQL usually doesn't do that well with subqueries, even independent ones. The posters that discussed joining are right - if you have a choice, use join. If you cannot easily use a join (e.g. foo.x in (select y from bar where y = xxx limit 10)), you are better off using the limit on the IN MEMORY temporary table and use the join on it.

If you use MySQL a lot use EXPLAIN and you will see how it uses your indexes and things like that.

+1


source


You tried:

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (123, 124)

      

?

I'm not a MySQL expert, but it seems to me that you can look at a significant glitch in the query optimizer.

On the other hand, it's good for MySQL that it optimizes the OR in the second expression. I know of databases that will successfully optimize IN (), but not the OR version of the same boolean query.

0


source


I think the optimizer is making some bad guesses. Replacing the query with an inner join may help.

0


source







All Articles