Simple query optimization (WHERE + ORDER + LIMIT)

I have this query that is incredibly slow (4 minutes):

SELECT * FROM `ad` WHERE `ad`.`user_id` = USER_ID ORDER BY `ad`.`id` desc LIMIT 20;

      

The declaration table contains approximately 10 million rows.

SELECT COUNT(*) FROM `ad` WHERE `ad`.`user_id` = USER_ID;

      

Returns 10k lines.

The table has the following indexes:

  PRIMARY KEY (`id`),
  KEY `idx_user_id` (`user_id`,`status`,`sorttime`),

      

EXPLAIN gives the following:

           id: 1
  select_type: SIMPLE
        table: ad
         type: index
possible_keys: idx_user_id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 4249
        Extra: Using where

      

I don’t understand why it took so long? Also this query is generated by ORM (pagination), so it would be nice to optimize it externally (maybe add an extra index).

By the way, this query is fast:

select aa.*
from (select id from ad where user_id=USER_ID order by id desc limit 20) as a
join ad as aa on a.id = aa.id ;

      

Edit: I tried another user with much fewer lines (dozens) than the original. I am wondering why the original query is not being used idx_user_id

:

EXPLAIN SELECT * FROM `ad` WHERE `ad`.`user_id` = ANOTHER_ID ORDER BY `ad`.`id` desc LIMIT 20;

           id: 1
  select_type: SIMPLE
        table: ad
         type: ref
possible_keys: idx_user_id
          **key: idx_user_id**
      key_len: 3
          ref: const
         rows: 84
        Extra: Using where; Using filesort

      

Edit2: with the help of Alexander, I decided to try and get MySQL to use the index I want and the following query is much faster (1s instead of 4 minutes):

SELECT * 
FROM `ad` USE INDEX (idx_user_id)
WHERE `ad`.`user_id` = 1884774
ORDER BY `ad`.`id` desc LIMIT 20; 

      

+3


source to share


1 answer


In the output, EXPLAIN

you can see that the value key

is PRIMARY

. This means that the MySQL optimizer decided that it was faster to scan all the table records (which are already sorted by id

) and look for the first 20 records with a specific value user_id

than to use a idx_user_id

key that the optimizer considered a possible key and then rejected.

In your second query, the optimizer sees that only values ​​are needed in the subquery id

, and opted to use an index instead idx_user_id

, since this index allows the list to be calculated id

without touching the table itself.Then only 20 records are retrieved by direct lookup on the primary key value, which is very fast works for this small number of records.

As you query with readings ANOTHER_ID

, MySQL's wrong decision was based on the number of rows for the previous value user_id

. This number was so large that the optimizer guessed that it would find the first 20 records with this particular one user_id

faster by simply looking at the table records themselves and skipping records with incorrect values user_id

.

If the rows of a table are accessed by index, this requires random access operations. For typical operations, random access to the hard disk is about 100 times slower than sequential scan. Therefore, for the index to be useful, it must reduce the number of rows to less than 1% of the total number of rows. If the rows for a particular value user_id

are more than 1% of the total number of rows, it may be more efficient to perform a full table scan instead of using an index if we want to get all those rows. But the MySQL optimizer ignores the fact that only 20 of these rows will be recovered. Therefore, he mistakenly decided not to use the index and perform a full table scan.



To quickly execute a query for any value user_id

, you can add another index to execute the query in the fastest way:

create index idx_user_id_2 on ad(user_id, id);

      

This index allows MySQL to perform both filtering and sorting. To do this, the columns used for filtering should be placed first, and the columns used for ordering should be placed on the second. MySQL must be smart enough to use this index because this index allows you to search for all the records you need without skipping any records.

+3


source







All Articles