Using index when SELECT from MySQL connection

I have the following two MySQL / MariaDB tables:

CREATE TABLE requests (
  request_id      BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  unix_timestamp  DOUBLE NOT NULL,
  [...]
  INDEX unix_timestamp_index (unix_timestamp)
);

CREATE TABLE served_objects (
  request_id      BIGINT UNSIGNED NOT NULL,
  object_name     VARCHAR(255) NOT NULL,
  [...]
  FOREIGN KEY (request_id) REFERENCES requests (request_id)
);

      

Each table contains several million columns. There are zero or more object_requests in the request. I have a view that provides a complete view of serve_objects by joining these two tables:

CREATE VIEW served_objects_view AS
SELECT
  r.request_id AS request_id,
  unix_timestamp,
  object_name
FROM requests r
RIGHT JOIN served_objects so ON r.request_id=so.request_id;

      

It all seems pretty simple so far. But when I do a simple SELECT like this:

SELECT * FROM served_objects_view ORDER BY unix_timestamp LIMIT 5;

      

A full minute or more is required. Obviously it doesn't use an index. I've tried many different approaches, including swiping and using LEFT or INNER instead, to no avail.

This is the EXPLAIN result for this SELECT:

+------+-------------+-------+--------+---------------+---------+---------+------------------+---------+---------------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref              | rows    | Extra                           |          
+------+-------------+-------+--------+---------------+---------+---------+------------------+---------+---------------------------------+
|    1 | SIMPLE      | so    | ALL    | NULL          | NULL    | NULL    | NULL             | 5196526 | Using temporary; Using filesort | 
|    1 | SIMPLE      | r     | eq_ref | PRIMARY       | PRIMARY | 8       | db.so.request_id |       1 |                                 |
+------+-------------+-------+--------+---------------+---------+---------+------------------+---------+---------------------------------+

      

Is there something fundamental here that prevents the index from being used? I understand that he needs to use a temporary table to satisfy the look and that this interferes with the ability to use an index. But I hope there is some trick that will allow me to SELECT from the view while respecting the indexes on the query table.

+3


source to share


3 answers


You are using the notorious performance antipattern.

 SELECT * FROM served_objects_view ORDER BY unix_timestamp LIMIT 5;

      

You told the query planner to make a copy of your entire view (in RAM or temporary storage), sort it, and throw away all but five lines. So he obeyed. It doesn't really matter how much time has passed.

SELECT *

is generally considered detrimental to query performance, and this is why it is true.

Try this lazy join optimization



SELECT a.* 
  FROM served_objects_view a
  JOIN (
         SELECT request_id
           FROM served_objects_view 
          ORDER BY unix_timestamp
          LIMIT 5
        ) b ON a.request_id = b.request_id

      

This sorts a smaller subset of the data (only request_id and timestamp values). Then it fetches a small subset of the view strings.

If this is too slow for your purposes, try creating a composite index on request (unix_timestamp, request_id)

. But this is probably not necessary. Focus on optimizing the subquery if needed.

Note RIGHT JOIN

:? Indeed? Don't you want it easy JOIN

?

+2


source


VIEWs

not always well optimized. Is the query running slowly when in use SELECT

? Have you added the suggested index?

What version of MySQL / MariaDB are you using? There may have been optimization improvements in newer versions and an update could help.



I can tell you might have to give up VIEW

.

0


source


The answer provided by O. Jones was the correct approach; thank you! The big savior here is that if the inner SELECT only refers to columns from the query table (for example, in the SELECTing only request_id case), the optimizer can satisfy the view without performing a join, making it invisible.

I had to make two adjustments to make it result in the same results as the original SELECT. First, if non-standard request_ids are returned by an inner SELECT, the outer JOIN creates a cross-product of these boisterous records. These duplicate rows can be effectively removed by changing the outer SELECT to SELECT DISTINCT.

Second, if the ORDER BY column may contain non-unique values, the result may contain irrelevant rows. They can be effectively discarded by SELECTing orderByCol and adding AND a.orderByCol = b.orderByCol to the JOIN rule.

So my final solution, which works well if orderByCol comes from the query table, looks like this:

SELECT DISTINCT a.*
  FROM served_objects_view a
  JOIN (
    SELECT request_id, <orderByCol> FROM served_objects_view
    <whereClause>
    ORDER BY <orderByCol> LIMIT <startRow>,<nRows>
  ) b ON a.request_id = b.request_id AND a.<orderByCol> = b.<orderByCol>
  ORDER BY <orderByCol>;

      

This is a more complex solution than I was hoping, but it works, so I'm happy.

Last comment. INNER JOIN and RIGHT JOIN are the same thing, so I originally phrased it in terms of RIGHT JOIN because that's how I understood it. However, after some experimentation (after your call), I found that the INNER connection is much more efficient. (This is what allows the optimizer to satisfy the view without performing a join if the inner SELECT only refers to columns from the query table.) Thanks again!

0


source







All Articles