Optimizing Mysql Queries

I have the following SQL query:

select expr1, operator, expr2, count(*) as c 
from log_keyword_fulltext 
group by expr1, operator, expr2 
order by c desc limit 2000;

      

Problem: count(*)

as part of my order, kills my app, possibly because it doesn't use an index. I would like to know if there is a way to do this faster, for example select

inside another select

, or something like that.

Mine select

explained:

+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table                | type  | possible_keys | key   | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | log_keyword_fulltext | index | NULL          | expr1 | 208     | NULL | 110000 | Using index; Using temporary; Using filesort | 
+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+

      

UPDATED:

I tried to make a subquery like this

select * from (select b.expr1,b.operator,b.expr2,count(*) as c 
from log_keyword_fulltext b group by b.expr1,b.operator,b.expr2) x 
order by x.c desc limit 2000;

      

works but not faster, here is the explanation:

+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+
| id | select_type | table      | type  | possible_keys | key   | key_len | ref  | rows   | Extra          |
+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL  | NULL    | NULL |  38398 | Using filesort | 
|  2 | DERIVED     | b          | index | NULL          | expr1 | 208     | NULL | 110000 | Using index    | 
+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+

      

Now you can test this, but don't use temporary, but still with the same performance. any recommendation?

0


source to share


6 answers


What am I missing? I don't see a WHERE clause. It looks like you are asking for a table scan.



If you are counting on your "LIMIT" clause, you're out of luck - this is the calculation of the COUNT aggregate.

+1


source


You are running a query that has to scan the whole table, it doesn't scale. There is no WHERE clause, so it is absolutely essential to scan the whole thing.



Consider saving some pivot tables instead of running this query frequently.

+2


source


Always try to take the number of distinct columns instead of taking count (*), as it takes a number in the permutation of each column of each row. So it takes longer

Eg:

select expr1, operator, expr2, count(expr1) as c 
from log_keyword_fulltext 
group by expr1, operator, expr2 
order by c desc limit 2000;

      

+1


source


What do you mean by "killing your application"? What's the context? How often do you run this query? What happens in the database while this query is being executed? Does this specific result need to be in real time? What are the conditions (inserts / s, selects / s, db size, etc.)

Here's what you could do:

  • Store the score in a separate table that you update with triggers on insert / delete

  • If you can't get MySQL to do it with simple table scrolling with some magic, try a stored procedure to do something like (pseudocode):

    CREATE TEMP TABLE t (e1 EXP_T, op OP_T, e2 EXP_T, count INTEGER)
    ADD AN INDEX ON count
    FOR EACH LINE OF SELECT exp1,operator,exp2 FROM log_blah DO
           UPDATE t SET count=count+1 WHERE exp1=e1 AND operator=op AND exp2=e2
           IF IT DOES NOT WORK INSERT INTO t VALUES (exp1,operator,exp2,1)
    DONE
    SELECT * FROM t ORDER BY count DESC LIMIT 2000
    
          

1 is probably what you want. And forget the indexes, this query has to loop through the whole table anyway.

0


source


The best way to prevent table scans is to add a cover index for fields that you regularly access. A one-time cost is required to create an index. There is also additional overhead for INSERT and DELETE operations on the table so that the index can be updated.

Cover indexes prevent the database from reading the entire record in memory to access the values ​​for multiple fields you care about. The entire query can be run from the index.

ALTER TABLE log_keyword_fulltext

ADD INDEX idx_name

(expr1, operator, expr2)

If these are not actual fields, but rather field operations such as left (foo, 20), you can actually index the portion of the field that you will use in future SELECT or WHERE clauses.

See this page for other optimization ideas: http://dev.mysql.com/doc/refman/5.0/en/query-speed.html

0


source


Trying to count and sort by it would be a killer. I would suggest trying to create a temporary table with invoices and then making a choice ... order that.

Not sure if this works in MySQL, but in PostreSQL or Oracle it will

create foo as 
   select expr1, operator, expr2, count(*) as c
   from log_keyword_fulltext 
   group by expr1, operator, expr2;
select * from foo order by c desc limit 2000;

      

Also, you have to do all the calculations to sort them, so the limit clause does not prevent all of these calculations from being done.

-1


source







All Articles