Table join with isteelf is very slow

I have a table that can be connected to itself. I want to connect it twice. Here's the schema:

CREATE TABLE `route_connections` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `from_route_iid` int(11) NOT NULL,
    `from_service_id` varchar(100) NOT NULL,
    `to_route_iid` int(11) NOT NULL,
    `to_service_id` varchar(100) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `to_route` (`to_route_iid`),
    KEY `from_route` (`from_route_iid`),
    KEY `to_service` (`to_service_id`),
    KEY `from_service` (`from_service_id`),
    KEY `from_to_route` (`from_route_iid`,`to_route_iid`)
) ENGINE=InnoDB AUTO_INCREMENT=6798783 DEFAULT CHARSET=utf8

      

It has about 3.7M lines.

My main goal is to find a path that uses 3 routes (2 route connections) knowing the list of allowed departure and arrival routes (the communication route must be queried).

Track: route A → route B → route C:

  • departure routes (known list, A)
  • route_connection

    c1 (A → B)
  • connecting route (unknown, B)
  • route_connection

    c2 (B → C)
  • arrival routes (known list, C)

So I need to choose three route_iid

s: c1.from

, c1.to

or c2.from

(which is the same), and c2.to

.

Also, I need to filter each one service_id

with the following filter:

service_id in (
    select service_id from (
        select service_id from calendar c
            where c.start_date <= 20141109 and end_date >= 20141109 

        union

        select service_id from calendar_dates cd 
            where cd.date = 20141109 and exception_type = 1 
    ) x 
    where x.service_id not in (
        select service_id from calendar_dates cd 
        where cd.date = 20141109 and exception_type = 2
    )
)

      

At first I work on connecting routes without worrying about filtering service_id

.

When looking for only one connection, the query takes <1ms (zero results):

select c.*
from route_connections c
where c.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
    and c.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)

      

But my goal is to find 2 connections, so I am using this query which takes a long time (zero results also):

select c1.*, c2.*
from
route_connections c1
inner join route_connections c2 on c2.from_route_iid = c1.to_route_iid
    and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
where c1.from_route_iid in (864, 865, 495, 494, 459, 54, 458)

      

It took 50 seconds, but I added an index from_to_route

which sped up the query to 18-20 seconds.

I also haven't tried using connections:

SELECT ...
FROM route_connections c1, route_connections c2
WHERE ...

      

but it gives exactly the same performance (I assume internally it is exactly the same as joining).

I tried changing the inner join to the left join + a clause HAVING

, but it was much worse (as expected).

I tried to drop all indexes, but these two:

  • PRIMARY KEY ( id

    ),
  • KEY from_route_iid

    ( from_route_iid

    , to_route_iid

    )

The result is the same, about 18-20 s.

Here's an explanation:

+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| id | select_type | table | type  | possible_keys                      | key            | key_len | ref                              | rows  | Extra                            |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
|  1 | SIMPLE      | c1    | range | to_route,from_route,from_route_iid | from_route     | 4       | NULL                             | 15464 | Using index condition; Using MRR |
|  1 | SIMPLE      | c2    | ref   | to_route,from_route,from_route_iid | from_route_iid | 4       | bicou_gtfs_paris.c1.to_route_iid |  1746 | Using index condition            |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+

      

What is the correct way to connect to a table? Am I missing the index or something else?

The hardware is a 2014 Mac, with a 1.7GHz Core i7, 8GB of RAM, and a 256GB SSD.
Mac OS X 10.10 Yosemite software, with MySQL 5.6.21

+3


source to share


1 answer


Ok, this is how I found the solution:

select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)

      

=> 15471 lines

select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid

      

=> 97 lines!



Same for arrival routes, 131 grouped lines versus 25427.

So this query:

select c1.from_route_iid, c2.from_route_iid, c2.to_route_iid
from (
    select from_route_iid, to_route_iid
    from route_connections
    where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
    group by to_route_iid
) c1, route_connections c2
where c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
group by c2.from_route_iid, c2.to_route_iid

      

runs in 145ms. Ok, this morning I started in 2 minutes :)

+1


source







All Articles