Table join with isteelf is very slow
I have a table that can be connected to itself. I want to connect it twice. Here's the schema:
CREATE TABLE `route_connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_route_iid` int(11) NOT NULL,
`from_service_id` varchar(100) NOT NULL,
`to_route_iid` int(11) NOT NULL,
`to_service_id` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `to_route` (`to_route_iid`),
KEY `from_route` (`from_route_iid`),
KEY `to_service` (`to_service_id`),
KEY `from_service` (`from_service_id`),
KEY `from_to_route` (`from_route_iid`,`to_route_iid`)
) ENGINE=InnoDB AUTO_INCREMENT=6798783 DEFAULT CHARSET=utf8
It has about 3.7M lines.
My main goal is to find a path that uses 3 routes (2 route connections) knowing the list of allowed departure and arrival routes (the communication route must be queried).
Track: route A → route B → route C:
- departure routes (known list, A)
-
route_connection
c1 (A → B) - connecting route (unknown, B)
-
route_connection
c2 (B → C) - arrival routes (known list, C)
So I need to choose three route_iid
s: c1.from
, c1.to
or c2.from
(which is the same), and c2.to
.
Also, I need to filter each one service_id
with the following filter:
service_id in (
select service_id from (
select service_id from calendar c
where c.start_date <= 20141109 and end_date >= 20141109
union
select service_id from calendar_dates cd
where cd.date = 20141109 and exception_type = 1
) x
where x.service_id not in (
select service_id from calendar_dates cd
where cd.date = 20141109 and exception_type = 2
)
)
At first I work on connecting routes without worrying about filtering service_id
.
When looking for only one connection, the query takes <1ms (zero results):
select c.*
from route_connections c
where c.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
and c.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
But my goal is to find 2 connections, so I am using this query which takes a long time (zero results also):
select c1.*, c2.*
from
route_connections c1
inner join route_connections c2 on c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
where c1.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
It took 50 seconds, but I added an index from_to_route
which sped up the query to 18-20 seconds.
I also haven't tried using connections:
SELECT ...
FROM route_connections c1, route_connections c2
WHERE ...
but it gives exactly the same performance (I assume internally it is exactly the same as joining).
I tried changing the inner join to the left join + a clause HAVING
, but it was much worse (as expected).
I tried to drop all indexes, but these two:
- PRIMARY KEY (
id
), - KEY
from_route_iid
(from_route_iid
,to_route_iid
)
The result is the same, about 18-20 s.
Here's an explanation:
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| 1 | SIMPLE | c1 | range | to_route,from_route,from_route_iid | from_route | 4 | NULL | 15464 | Using index condition; Using MRR |
| 1 | SIMPLE | c2 | ref | to_route,from_route,from_route_iid | from_route_iid | 4 | bicou_gtfs_paris.c1.to_route_iid | 1746 | Using index condition |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
What is the correct way to connect to a table? Am I missing the index or something else?
The hardware is a 2014 Mac, with a 1.7GHz Core i7, 8GB of RAM, and a 256GB SSD.
Mac OS X 10.10 Yosemite software, with MySQL 5.6.21
source to share
Ok, this is how I found the solution:
select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
=> 15471 lines
select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid
=> 97 lines!
Same for arrival routes, 131 grouped lines versus 25427.
So this query:
select c1.from_route_iid, c2.from_route_iid, c2.to_route_iid
from (
select from_route_iid, to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid
) c1, route_connections c2
where c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
group by c2.from_route_iid, c2.to_route_iid
runs in 145ms. Ok, this morning I started in 2 minutes :)
source to share