How do I select N random rows using pure SQL?

How to concatenate How to query a random string in SQL? and Multiple random values ​​in SQL Server 2005 to select N random rows using a single pure-SQL query? Ideally, I would like to avoid using stored procedures if possible. Is it possible?

EXPLANATIONS

  • Pure SQL is as close to the ANSI / ISO standard as possible.
  • The solution must be "reasonably effective". The ORDER BY RAND () provided may work, but as others have pointed out, this is not possible for medium sized tables.
+8


source to share


5 answers


I don't know about pure ANSI and it's not easy, but you can check my answer to a similar question here: Simple random samples from Sql database



+2


source


The answer to your question is in the second link:

SELECT * FROM table ORDER BY RAND() LIMIT 1

      

Just change the limit and / or rewrite for SQL Server:

SELECT TOP 1 * FROM table ORDER BY newid()

      

Now this strictly answers your question, but you really shouldn't use this solution. Just try it on a big table and you will see what I mean.



If your key space is consistent, either no holes or very few holes, and if it has very few holes, you are not too concerned that some rows have a slightly higher chance of being picked than others, then you can use a variation where you calculate which key you want to get randomly, starting from 1 to the highest key in your table, and then retrieving the first row that has a key equal to or greater than the number you calculated. You only need the "above" part if there are holes in your key space.

This SQL is left as an exercise for the reader.


Edit . Note. A comment on another answer here mentions that maybe pure SQL means ANSI SQL standard. If so, then there is no way, as there is no standardized random function and every database engine does not treat the random number function the same way. At least one engine I've seen "optimizes" the call by calling it once and simply repeating the computed value for all rows.

+5


source


Here's a potential solution that allows you to balance the risk of getting fewer than N rows versus offsetting the fetch from the "front" of the table. This assumes that N is small compared to the size of the table:

select * from table where random() < (N / (select count(1) from table)) limit N;

      

Typically this will display most of the table, but may return fewer than N rows. If some offset is acceptable, the numerator can be changed from N to 1.5 * N or 2 * N so that it is very likely that N rows will be returned. Also, if you need to randomize the order of the rows, rather than just pick an arbitrary subset:

select * from (select * from table
                where random() < (N / (select count(1) from table)) limit N)
 order by mod(tableid,1111);

      

The downside to this solution is that, at least in PostgreSQL, it uses a sequential table scan. A larger numerator will speed up the query.

+1


source


This might help you:

SELECT TOP 3 * FROM TABLE ORDER BY NEWID()

      

-1


source


Using the code below you can achieve what you are looking for.

select top 1 * from student1 order by newid()

      

change the value of N where top is 1 so you get this number of random entries.

-2


source







All Articles