MySQL selection with offset is faster than no offset
I have been tinkering with query efficiency for a paginated system to make the data fetch as fast as possible, but I am running into something I don't quite understand. As far as I know, when a limit with an offset is used, MySQL has to iterate over each row before the offset and then discard them, so in theory a query with an offset of 10,000 would be much slower than one without, which is usually true as in this case
select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 0, 100;
/* finishes in 2.497 seconds */
select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 10000, 100;
/* finishes in 2.702 seconds */
But if I use an inner join to join the table to itself with just the column UserID
to do the sort and constraint, then it will be faster faster with an offset of 10,000 than without it, which completely disgusts me. Example:
select SQL_NO_CACHE * from `customers`
inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 100)
as `Results` using(`UserID`)
/* finishes in 1.133 seconds */
select SQL_NO_CACHE * from `customers`
inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 10000, 100)
as `Results` using(`UserID`)
/* finishes in 1.120 seconds */
Why is a query using an offset always faster than a query without an offset?
Explains:
I have posted a google docs spreadsheet here with explains
content here
Note. The tests above ran in PHP loops 20 times each
Note 2 : customers
is a view, not a base table
source to share
Case 1: The optimizer can use an index on ORDER BY
. LIMIT 10
will be faster than LIMIT 10000,10
that because it can stop reading lines early.
Case 2: The optimizer cannot (or does not want to) use an index for ORDER BY
. In this case, the entire set of rows is collected (after WHERE
), this set is sorted, and only then OFFSET
and are applied LIMIT
. In this case, the value OFFSET
doesn't really matter; most of the time consumed row fetching, filtering and sorting.
INDEX(x,y)
SELECT ... WHERE x=2 ORDER BY y LIMIT ... -- case 1
SELECT ... WHERE x=2 AND deleted=0 ORDER BY y LIMIT ... -- case 2
INDEX(NetworkID, DateTimeAdded) -- composite
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 1
INDEX(NetworkID), INDEX(DateTimeAdded) -- separate
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 3
Case 3 can be like Case 1 because it can use INDEX(DateTimeAdded)
. Or the optimizer chooses to use a different index, it is slower in case 2. In any case, it is not as good as the use of a composite index that can handle both WHERE
, and ORDER BY
.
If you manage to get to Case 1, I recommend that you also "remember where you left off" to make Pagination even more effective. See my blog Pagination .
source to share