MySQL selection with offset is faster than no offset

I have been tinkering with query efficiency for a paginated system to make the data fetch as fast as possible, but I am running into something I don't quite understand. As far as I know, when a limit with an offset is used, MySQL has to iterate over each row before the offset and then discard them, so in theory a query with an offset of 10,000 would be much slower than one without, which is usually true as in this case

select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()' 
    order by `DateTimeAdded` desc limit 0, 100;
/* finishes in 2.497 seconds */

 select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()' 
   order by `DateTimeAdded` desc limit 10000, 100;
 /* finishes in 2.702 seconds */

      

But if I use an inner join to join the table to itself with just the column UserID

to do the sort and constraint, then it will be faster faster with an offset of 10,000 than without it, which completely disgusts me. Example:

select SQL_NO_CACHE * from `customers` 
    inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()' 
        order by `DateTimeAdded` desc limit 100) 
    as `Results` using(`UserID`)
/* finishes in 1.133 seconds */

select SQL_NO_CACHE * from `customers` 
    inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()' 
        order by `DateTimeAdded` desc limit 10000, 100) 
    as `Results` using(`UserID`)
/* finishes in 1.120 seconds */

      

Why is a query using an offset always faster than a query without an offset?


Explains:

I have posted a google docs spreadsheet here with explains

content here

Note. The tests above ran in PHP loops 20 times each

Note 2 : customers

is a view, not a base table

+3


source to share


1 answer


Case 1: The optimizer can use an index on ORDER BY

. LIMIT 10

will be faster than LIMIT 10000,10

that because it can stop reading lines early.

Case 2: The optimizer cannot (or does not want to) use an index for ORDER BY

. In this case, the entire set of rows is collected (after WHERE

), this set is sorted, and only then OFFSET

and are applied LIMIT

. In this case, the value OFFSET

doesn't really matter; most of the time consumed row fetching, filtering and sorting.

INDEX(x,y)
SELECT ... WHERE x=2               ORDER BY y LIMIT ... -- case 1
SELECT ... WHERE x=2 AND deleted=0 ORDER BY y LIMIT ... -- case 2

INDEX(NetworkID, DateTimeAdded)         -- composite
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 1

INDEX(NetworkID), INDEX(DateTimeAdded)  -- separate
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 3

      



Case 3 can be like Case 1 because it can use INDEX(DateTimeAdded)

. Or the optimizer chooses to use a different index, it is slower in case 2. In any case, it is not as good as the use of a composite index that can handle both WHERE

, and ORDER BY

.

If you manage to get to Case 1, I recommend that you also "remember where you left off" to make Pagination even more effective. See my blog Pagination .

Learn more about creating INDEX .

+1


source







All Articles