ORDER BY optimization
I am trying to optimize this query by sorting posts
by field reputation
(1st) and then id
(2nd). Without the 1st request, the field takes ~ 0.250 seconds, but with it it takes ~ 2.500 seconds (which means 10 times less, terrible). Any suggestion?
SELECT -- everything is ok here
FROM posts AS p
ORDER BY
-- 1st: sort by reputation if exists (1 reputation = 1 day)
(CASE WHEN p.created_at >= unix_timestamp(now() - INTERVAL p.reputation DAY)
THEN +p.reputation ELSE NULL END) DESC, -- also used 0 instead of NULL
-- 2nd: sort by id dec
p.id DESC
WHERE p.status = 'published' -- the only thing for filter
LIMIT 0,10 -- limit provided as well
Notes:
- Using InnoDB (MySQL 5.7.19)
- Primary id
in posts
Table
- Fields indexed both created_at
, andreputation
Explain the result:
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra # '1', 'SIMPLE', 'p', NULL, 'ALL', NULL, NULL, NULL, NULL, '31968', '100.00', 'Using filesort'
UPDATE ^^
Reputation gives the following: a message for how many (n = reputation) days can be displayed at the top of the list.
In fact, I tried to give a reputation to some posts that could be found at the top of the list and find this solution: Ordering posts by "rep" but only for "one" daily limit . But after a while (about 2 years) this solution became a problem due to the increase in the amount of tabular data. If I cannot resolve this, I must remove this function from the service.
UPDATE ^^
-- all date are unix timestamp (bigint)
SELECT p.*
, u.name user_name, u.status user_status
, c.name city_name, t.name town_name, d.name dist_name
, pm.meta_name, pm.meta_email, pm.meta_phone
-- gets last comment as json
, (SELECT concat("{",
'"id":"', pc.id, '",',
'"content":"', replace(pc.content, '"', '\\"'), '",',
'"date":"', pc.date, '",',
'"user_id":"', pcu.id, '",',
'"user_name":"', pcu.name, '"}"') last_comment_json
FROM post_comments pc
LEFT JOIN users pcu ON (pcu.id = pc.user_id)
WHERE pc.post_id = p.id
ORDER BY pc.id DESC LIMIT 1) AS last_comment
FROM posts p
-- no issues with these
LEFT JOIN users u ON (u.id = p.user_id)
LEFT JOIN citys c ON (c.id = p.city_id)
LEFT JOIN towns t ON (t.id = p.town_id)
LEFT JOIN dists d ON (d.id = p.dist_id)
LEFT JOIN post_metas pm ON (pm.post_id = p.id)
WHERE p.status = 'published'
GROUP BY p.id
ORDER BY
-- everything okay until here
-- any other indexed fields makes query slow, not just "case" part
(CASE WHEN p.created_at >= unix_timestamp(now() - INTERVAL p.reputation DAY)
THEN +p.reputation ELSE NULL END) DESC,
-- only id field (primary) is effective, no other indexes
p.id DESC
LIMIT 0,10;
Explain;
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra 1, PRIMARY, p,, ref, PRIMARY, user_id, status, reputation, created_at, city_id-town_id-dist_id, title-content, status, 1, const, 15283, 100.00, Using index condition; Using temporary; Using filesort # dunno, these join are not using, but if i remove returning fields from select part show "Using index condition" 1, PRIMARY, u,, eq_ref, PRIMARY, PRIMARY, 2, p.user_id, 1, 100.00, 1, PRIMARY, c,, eq_ref, PRIMARY, PRIMARY, 1, p.city_id, 1, 100.00, 1, PRIMARY, t,, eq_ref, PRIMARY, PRIMARY, 2, p.town_id, 1, 100.00, 1, PRIMARY, d,, eq_ref, PRIMARY, PRIMARY, 2, p.dist_id, 1, 100.00, 1, PRIMARY, pp,, eq_ref, PRIMARY, PRIMARY, 2, p.id, 1, 100.00, 2, DEPENDENT SUBQUERY, pc,, ref, post_id, visibility, status, post_id, 2, func, 2, 67.11, Using index condition; Using where; Using filesort 2, DEPENDENT SUBQUERY, pcu,, eq_ref, PRIMARY, PRIMARY, 2, pc.user_id, 1, 100.00,
source to share
This is a very interesting request. While optimizing it, you can discover and understand a lot of new information about how MySQL works. I'm not sure if I will have time to write everything in detail at once, but I can update gradually.
Why is it slow
There are basically two scenarios, fast and slow.
In a quick script, you are walking in a specific order over a table and probably at the same time quickly fetch some data by id for each row from other tables. In this case, you will stop walking as soon as you have enough rows specified in the LIMIT clause. Where does the order come from? From the b-tree index you have on the table, or the order of the result set in a subquery.
In a slow scenario, you don't have this predefined order, and MySQL has to implicitly put all data in a temporary table, sort the table in some field, and return n rows from your LIMIT clause. If any of the fields you put into the temporary table are of type TEXT (not VARCHAR), MySQL doesn't even try to save that table to RAM and dumps and sorts it on disk (hence the extra I / O handling).
First thing to fix
There are many situations where you cannot create an index that will allow you to follow its order (for example, when ORDER BY columns are from different tables), so a rule of thumb in these situations is to minimize the data that MySQL will put in the temporary table. How can you do this? You only select row ids in the subquery and after you have ids, you concatenate ids to the table itself and other tables to retrieve the content. That is, you make a small table with an order and then use a quick script. (This is a bit at odds with SQL in general, but every taste of SQL has its own means of optimizing queries this way.)
By the way, yours SELECT -- everything is ok here
looks funny, as this is the first place where it is not normal.
SELECT p.*
, u.name user_name, u.status user_status
, c.name city_name, t.name town_name, d.name dist_name
, pm.meta_name, pm.meta_email, pm.meta_phone
, (SELECT concat("{",
'"id":"', pc.id, '",',
'"content":"', replace(pc.content, '"', '\\"'), '",',
'"date":"', pc.date, '",',
'"user_id":"', pcu.id, '",',
'"user_name":"', pcu.name, '"}"') last_comment_json
FROM post_comments pc
LEFT JOIN users pcu ON (pcu.id = pc.user_id)
WHERE pc.post_id = p.id
ORDER BY pc.id DESC LIMIT 1) AS last_comment
FROM (
SELECT id
FROM posts p
WHERE p.status = 'published'
ORDER BY
(CASE WHEN p.created_at >= unix_timestamp(now() - INTERVAL p.reputation DAY)
THEN +p.reputation ELSE NULL END) DESC,
p.id DESC
LIMIT 0,10
) ids
JOIN posts p ON ids.id = p.id -- mind the join for the p data
LEFT JOIN users u ON (u.id = p.user_id)
LEFT JOIN citys c ON (c.id = p.city_id)
LEFT JOIN towns t ON (t.id = p.town_id)
LEFT JOIN dists d ON (d.id = p.dist_id)
LEFT JOIN post_metas pm ON (pm.post_id = p.id)
;
This is the first step, but even now you can see that you don't need to do those useless LEFT JOINS and json serializations for strings you don't need. (I missed GROUP BY p.id
it because I don't see which LEFT JOIN can result in multiple lines, you are not doing any aggregation).
else to write about:
- indices
- reformulate the CASE clause (use UNION ALL)
- possibly forcing an index
source to share
Here is your problem:
- "ORDER BY expression": the expression must be evaluated for each row of the table, then the sort is performed on the entire table, then the results are passed through LIMIT.
- No index use: "ORDER BY col" when "col" is part of an index can eliminate sorting by going through the index in order. This is very efficient when used with LIMIT. However, it won't work here.
There are ways out of this mess, but you need to specify how many different "reputation" levels you have (for example, 3 or "many") and how they are distributed statistically (for example, 1 user with a reputation of 100, and all others have zero or even distributed).
EDIT
Hmm, there is no information about the statistical distribution of "reputation" or its possible range of values. In this case, let go of the dumb approach:
Add a column "repdate" which contains:
repdate = p.created_at + INTERVAL p.reputation DAY
This corresponds to a shift in posts one day in the future for every reputation point they have. They will then be sorted accordingly. Adjust taste if p.created_at is not DATETIME.
Now we can just "ORDER BY repdate DESC" and with an index on it, it'll be fast.
source to share
select *
from (
SELECT -- everything is ok here
, CASE
WHEN p.created_at >= unix_timestamp(now() - INTERVAL p.reputation DAY)
THEN + p.reputation ELSE NULL END order_col
FROM posts AS p
WHERE p.status = 'published' -- the only thing for filter
LIMIT 0,10 -- limit provided as well
) a
ORDER BY
a.order_col desc
,a.id DESC
source to share
-
Inflate-deflate -
LEFT JOIN
Inflates the number of lines,GROUP BY
then flushes. Excessive row counts are costly. Instead, focus on getting the ids for the desired strings before doing anyJOINing
. If you're lucky, you can get rid ofGROUP BY
. -
WP schema is an EAV schema that sucks when it comes to performance and scaling.
-
What are your indices? See this one for a better meta table.
-
Complex
ORDER BY
. This causes all rows to be collected (after filtering) before sorting and executingLIMIT
. Rethink the proposalORDER BY
if possible.
After you've done the best you can with your suggestions, start another question to continue clarifying. Be sure to include EXPLAIN SELECT ...
and SHOW CREATE TABLE
.
source to share