MySQL vs SQL Server Express Performance Comparison

I have a somewhat complex query with about 100k rows.

The request is executed in 13 seconds in SQL Server Express (executed in my dev block)

The same query with the same indexes and tables takes over 15 minutes to run in MySQL 5.1 (running on my production box - much more powerful and tested with 100% resources). Sometimes a request crashes out of the machine with a memory error.

What am I doing wrong in MySQL? Why is it taking so long?

select e8.*
from table_a e8
inner join (
    select max(e6.id) as id, e6.category, e6.entity, e6.service_date
    from (
        select e4.* 
        from table_a e4
        inner join (
            select max(e2.id) as id, e3.rank, e2.entity, e2.provider_id, e2.service_date
            from table_a e2
            inner join (
                select min(e1.rank) as rank, e1.entity, e1.provider_id, e1.service_date
                from table_a e1
                where e1.site_id is not null
                group by e1.entity, e1.provider_id, e1.service_date 
            ) as e3
            on e2.rank= e3.rank
            and e2.entity = e3.entity
            and e2.provider_id = e3.provider_id
            and e2.service_date = e3.service_date
            and e2.rank= e3.rank
            group by e2.entity, e2.provider_id, e2.service_date, e3.rank
        ) e5
        on e4.id = e5.id
        and e4.rank= e5.rank                            
    ) e6
    group by e6.category, e6.entity, e6.service_date 
) e7
on e8.id = e7.id and e7.category = e8.category

      

+1


source to share


5 answers


This answer, which I originally tried to post in your deleted question, did not indicate that this is a MySQL issue. I will still continue to use SQL Server to refactor the query with CTEs and then convert back to nested queries (if they remain). Sorry for the formatting, Jeff Atwood sent me the original text and I had to reformat it again.

It's hard to get by without data, expected results, and good names, but I would convert all nested queries to CTEs, add them up, name them meaningfully and refactorically - starting by excluding columns you don't use. Removing the columns will not lead to improvement because the optimizer is pretty smart, but it will give you the opportunity to improve your query - perhaps by playing out some or all of the CTEs. I'm not sure what your code is doing, but you might find the new functions like RANK () useful because it looks like you are using a reverse lookup type with all these self-joins.



So, start here. I looked at the improvements to e7 for you, columns not used in e7 could indicate either a defect or incomplete thinking about the grouping possibilities, but if those columns are really not needed then it could seep completely through your logic in e6, e5 and e3 ... If the grouping in e7 is correct, you can exclude everything except max (id) in the results and the union. I don't understand why you would have multiple MAX (id) for each category, because that would multiply your results when joining, so MAX (id) must be unique in the category, in which case the category is redundant in the join.

WITH e3 AS (
select min(e1.rank) as rank,
e1.entity,
e1.provider_id,
e1.service_date
from table_a e1
where e1.site_id is not null
group by e1.entity, e1.provider_id, e1.service_date
)

,e5 AS (
select max(e2.id) as id,
e3.rank,
e2.entity,
e2.provider_id,
e2.service_date
from table_a e2
inner join e3
on e2.rank= e3.rank
and e2.entity = e3.entity
and e2.provider_id = e3.provider_id
and e2.service_date = e3.service_date
and e2.rank= e3.rank
group by e2.entity, e2.provider_id, e2.service_date, e3.rank
)

,e6 AS (
select e4.* -- switch from * to only the columns you are actually using
from table_a e4
inner join e5
on e4.id = e5.id
and e4.rank= e5.rank
)

,e7 AS (
select max(e6.id) as id, e6.category -- unused, e6.entity, e6.service_date
from e6
group by e6.category, e6.entity, e6.service_date
-- This instead
-- select max(e6.id) as id
-- from e6
-- group by e6.category, e6.entity, e6.service_date
)

select e8.*
from table_a e8
inner join e7
on e8.id = e7.id
and e7.category = e8.category
-- THIS INSTEAD on e8.id = e7.id

      

+2


source


100,000 rows shouldn't take 13 seconds if there were effective indexes. I suspect the difference is due to the fact that SQL Server has a much more robust query optimizer than MySQL. That MySQL has more order of SQL Parser than optimizer.

To get started, you need to provide a lot more information - complete schemas of all participating tables and a complete list of indexes for each.



Then some idea of ​​what the data represents and what the query should do. Something in order of use.

+1


source


It would be interesting to DECLARE THE PLAN to see what the differences are. I'm not sure if this is an apple versus orange comparison, but I would be curious.

I don't know if it can , but it was the first hit on the search for "mysql query optimizer".

+1


source


Here's another one that might be helpful.

0


source


The only open source database I know of who has CTEs is Firebird ( http://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb210-cte )

Postgres will be in 8.4. I think

0


source







All Articles