Pick a random record from the table, why is it slower than a second at first?

Question

Pick a random record from the table, why is it slower than a second at first?

I want to select a random record from a large table. After searching, I finally found two solutions:

and:

select id  from `table` where id = (floor(1 + rand() * 2880000));

b:

select id  from `table` where id >= (floor(1 + rand() * 2880000)) limit 1;

But the first (a) solution is much slower than the second (b), about 40 times slower.

After doing it many times, I find a stranger problem. The first solution might return two records.

select id  from `table` where id = (floor(1 + rand() *  2880000));
+---------+
| id      |
+---------+
| 2484024 |
| 1425029 |
+---------+
2 rows in set (1.06 sec)

My question is:

Why is the first solution slower than the second?
Why did the first solution return two records?

My MySQL version:

mysql> show variables like "%version%";
+-------------------------+-------------------------+
| Variable_name           | Value                   |
+-------------------------+-------------------------+
| innodb_version          | 5.5.43                  |
| protocol_version        | 10                      |
| slave_type_conversions  |                         |
| version                 | 5.5.43-0ubuntu0.12.04.1 |
| version_comment         | (Ubuntu)                |
| version_compile_machine | x86_64                  |
| version_compile_os      | debian-linux-gnu        |
+-------------------------+-------------------------+
7 rows in set (0.04 sec)

Thanks for any help.

+3

sql mysql

Yejing 07 Aug 15 at 3:51

source to share

2 answers



SELECT 
    a.id
FROM
    tableA a
        INNER JOIN
    (SELECT 
        (ROUND((RAND() * (MAX(id) - MIN(id))) + MIN(id)) - 1) r
    FROM
        tableA) x
WHERE
    a.id > x.r
LIMIT 1;

0

Ricardo sismeiro 12 Aug 15 at 11:50 pm

source to share

Robby cornelissen · Accepted Answer · 2015-08-07T03:54:17+0000

Answers to both questions:

The first solution is slower than the second, because the first solution calculates a new random value for each record, and the second only calculates the records needed to find one match. Also note that the condition for the second solution is much less stringent.
In the first solution, you can have multiple return values because a new random value is calculated for each record and you don't have a limit operator. By the same logic, you can also get 0 results.

Check out this answer for a better solution.

Pick a random record from the table, why is it slower than a second at first?

More articles: