Why does a query with IN (subquery) take longer than a query with IN (discrete list)
It always bothered me why this request
SELECT
*
FROM
`TABLE`
WHERE `value` IN
(SELECT
val
FROM
OTHER_TABLE
WHERE `date` < '2014-01-01')
running orders is slower than running both of these queries sequentially
SELECT
`val`
FROM
OTHER_TABLE
WHERE `date` < '2014-01-01'
Result:
+----+
| val |
+-----+
| v1 |
| v2 |
| v3 |
| v7 |
| v12 |
+-----+
and this request:
SELECT
*
FROM
`TABLE`
WHERE `value` IN ('v1', 'v2', 'v3', 'v7', 'v12')
source to share
From the docs: (emphasis mine)
Optimizing a subquery for is
IN
not as efficient as it is for an operator=
or for an operatorIN(value_list)
.A typical case for poor
IN
subquery performance is when the subquery returns a small number of rows, but the outer query returns a large number of rows to compare against the result of the subquery.The problem is that for a statement that uses a subquery
IN
, the optimizer rewrites it as a correlated subquery. Consider the following operator, which uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the operator into a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return rows M and N, respectively, then the execution time becomes of order
O(M×N)
rather thanO(M+N)
as it would be for an uncorrelated subquery.The implication is that a subquery
IN
can be much slower than a query written using an operatorIN(value_list)
that displays the same values that the subquery will return.
http://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html
Hope this helps anyone who would be curious
source to share