Why is adding an INNER JOIN making this query so slow?

Question

Why is adding an INNER JOIN making this query so slow?

I have a database with the following three tables:

table of correspondence

has 200,000 matches ...

CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

hero table has ~ 100 heroes ...

CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Table

matches_heroes has 2,000,000 relationships (10 random heroes per match) ...

CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8

The following request takes 1 second, which seems pretty slow to me for something this simple:

SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5

Removing only the WHERE clause doesn't help, but if I also pull out the INNER JOIN, like this:

SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches

... it only takes 0.05 seconds. INNER JOIN seems to be very expensive. I don't have much experience with joins. Is this normal or am I doing something wrong?

UPDATE # 1: Result of EXPLAIN.

id  select_type  table          type   possible_keys                     key     key_len  ref                                rows  Extra  
1   SIMPLE       matches_heroes ref    match_id,hero_id,match_id_hero_id hero_id 2        const                              34742
1   SIMPLE       matches        eq_ref PRIMARY                           PRIMARY 8        mydatabase.matches_heroes.match_id 1     Using index

UPDATE # 2: After listening to you guys, I think it works correctly and it is as fast as it gets. Please let me know if you disagree. Thanks for the help. I really appreciate it.

+3

sql mysql

DaiBu 10 Sep 14 at 11:03

source to share

2 answers

So you're saying that reading a table of 200,000 records is faster than reading a table of 2,000,000 records, looking for the right ones, and then all of them to find the matching records of 200,000 records?

Does that surprise you? This is just a lot of work for dbms. (Maybe btw that dbms decides not to use the hero_id index if it thinks a full table scan is faster.)

So, in my opinion, there is nothing wrong with what is happening here.

+1

Thorsten kettner 10 Sep 14 at 12:02

source to share

Stefan Rogin · Accepted Answer · 2014-09-10T11:23:00+0000

Use COUNT(matches.match_id)

instead count(*)

as it is best not to use when using joins *

as it does additional computation. Using columns from a join is the best way to make sure you are not asking for any other operations. (not a problem on MySql InnerJoin, my failure).

Also you must make sure that you have all defragmented keys and enough free space to load the index in memory

Update 1:

Try adding a compiled index for match_id,hero_id

as it should provide better performance.

ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)

Update 2:

I was not happy with the accepted answer that mysql is slow for only 2 million records, and I used benchmarks on my ubuntu machine (i7 processor with standard hard drive).

-- pre-requirements

CREATE TABLE seq_numbers (
    number INT NOT NULL
) ENGINE = MYISAM;


DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
    BEGIN
        DECLARE i INT;
        SET i = MinVal;
        START TRANSACTION;
        WHILE i <= MaxVal DO
            INSERT INTO seq_numbers VALUES (i);
            SET i = i + 1;
        END WHILE;
        COMMIT;
    END$$
DELIMITER ;

CALL InsertSeq(1,200000)
;

ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;

--  create tables

-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;

CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;

CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100

-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000

-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000

-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes


-- get random hero_id
SET @randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);


-- test 1 

SELECT SQL_NO_CACHE @randHeroId,COUNT(*) AS match_count
FROM matches as a 
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = @randHeroId
; -- Time: 0.039s


-- test 2: adding some complexity 
SET @randName = (SELECT `name` FROM heroes WHERE hero_id = @randHeroId LIMIT 1);

SELECT SQL_NO_CACHE @randName, COUNT(*) AS match_count
FROM matches as a 
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = @randName
; -- Time: 0.037s

Conclusion: The benchmark results are about 20x faster and my server load was around 80% before testing as this is not a dedicated mysql server and other intensive cpu tasks were running, so if you run the whole script (from above) and get lower results , this may be because:

you have a shared host and the download is too big. In this case, there is not much you can do: you either complain about your current host, pay for a better host / vm, or try a different host.
configured key_buffer_size (for MyISAM) or innodb_buffer_pool_size (for innoDB) is too small, the optimal size would be more than 150 MB.
your available bar is not enough, it will take about 100 - 150 mb of a bar to load indexes into memory. solution: free the rams or buy more.

Note that when using a test script, generating new data will eliminate the issue of index fragmentation. Hope this helps and ask if you have any problems testing this.

obs:

SELECT SQL_NO_CACHE COUNT(*) AS match_count 
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id 
WHERE hero_id = 5`

is equivalent to:

SELECT SQL_NO_CACHE COUNT(*) AS match_count 
FROM matches_heroes 
WHERE hero_id = 5`

This way, you won't need a connection if you want an account, but I assume that was just an example.

Why is adding an INNER JOIN making this query so slow?

Update 1:

Update 2:

More articles: