Why is adding an INNER JOIN making this query so slow?
I have a database with the following three tables:
table of correspondencehas 200,000 matches ...
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
hero table has ~ 100 heroes ...
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Table matches_heroes has 2,000,000 relationships (10 random heroes per match) ...
CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8
The following request takes 1 second, which seems pretty slow to me for something this simple:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5
Removing only the WHERE clause doesn't help, but if I also pull out the INNER JOIN, like this:
SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches
... it only takes 0.05 seconds. INNER JOIN seems to be very expensive. I don't have much experience with joins. Is this normal or am I doing something wrong?
UPDATE # 1: Result of EXPLAIN.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE matches_heroes ref match_id,hero_id,match_id_hero_id hero_id 2 const 34742
1 SIMPLE matches eq_ref PRIMARY PRIMARY 8 mydatabase.matches_heroes.match_id 1 Using index
UPDATE # 2: After listening to you guys, I think it works correctly and it is as fast as it gets. Please let me know if you disagree. Thanks for the help. I really appreciate it.
source to share
Use (not a problem on MySql InnerJoin, my failure).COUNT(matches.match_id)
instead count(*)
as it is best not to use when using joins *
as it does additional computation. Using columns from a join is the best way to make sure you are not asking for any other operations.
Also you must make sure that you have all defragmented keys and enough free space to load the index in memory
Update 1:
Try adding a compiled index for match_id,hero_id
as it should provide better performance.
ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)
Update 2:
I was not happy with the accepted answer that mysql is slow for only 2 million records, and I used benchmarks on my ubuntu machine (i7 processor with standard hard drive).
-- pre-requirements
CREATE TABLE seq_numbers (
number INT NOT NULL
) ENGINE = MYISAM;
DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
BEGIN
DECLARE i INT;
SET i = MinVal;
START TRANSACTION;
WHILE i <= MaxVal DO
INSERT INTO seq_numbers VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertSeq(1,200000)
;
ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;
-- create tables
-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100
-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000
-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000
-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes
-- get random hero_id
SET @randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);
-- test 1
SELECT SQL_NO_CACHE @randHeroId,COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = @randHeroId
; -- Time: 0.039s
-- test 2: adding some complexity
SET @randName = (SELECT `name` FROM heroes WHERE hero_id = @randHeroId LIMIT 1);
SELECT SQL_NO_CACHE @randName, COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = @randName
; -- Time: 0.037s
Conclusion: The benchmark results are about 20x faster and my server load was around 80% before testing as this is not a dedicated mysql server and other intensive cpu tasks were running, so if you run the whole script (from above) and get lower results , this may be because:
- you have a shared host and the download is too big. In this case, there is not much you can do: you either complain about your current host, pay for a better host / vm, or try a different host.
- configured key_buffer_size (for MyISAM) or innodb_buffer_pool_size (for innoDB) is too small, the optimal size would be more than 150 MB.
- your available bar is not enough, it will take about 100 - 150 mb of a bar to load indexes into memory. solution: free the rams or buy more.
Note that when using a test script, generating new data will eliminate the issue of index fragmentation. Hope this helps and ask if you have any problems testing this.
obs:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5`
is equivalent to:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches_heroes
WHERE hero_id = 5`
This way, you won't need a connection if you want an account, but I assume that was just an example.
source to share
So you're saying that reading a table of 200,000 records is faster than reading a table of 2,000,000 records, looking for the right ones, and then all of them to find the matching records of 200,000 records?
Does that surprise you? This is just a lot of work for dbms. (Maybe btw that dbms decides not to use the hero_id index if it thinks a full table scan is faster.)
So, in my opinion, there is nothing wrong with what is happening here.
source to share