Limit the number of rows in a group from concatenation (not to 1 row)

Question

Limit the number of rows in a group from concatenation (not to 1 row)

Given these tables:

TABLE Stores (
 store_id INT,
 store_name VARCHAR,
 etc
);

TABLE Employees (
 employee_id INT,
 store_id INT,
 employee_name VARCHAR,
 currently_employed BOOLEAN,
 etc
);

I want to list the 15 most able-bodied employees for each store (let say 15 the lowest employee_id

), or all of the employees for the store, if their 15 currently_employed='t'

. I want to do this with a connection offer.

I found many examples of people doing this for only 1 line, usually min or max (one longest employee), but I want to basically concatenate ORDER BY

and LIMIT

inside a join. Some of these examples can be found here:

Limit the results of the concatenated table to one row
MySQL returns 1 image for each product

I also found some decent examples for this shopping mall (I don't, I have about 5000 stores):

Get first n records for each group of grouped results

I also saw that you can use TOP

instead of ORDER BY

and LIMIT

, but not for PostgreSQL.

I believe that suggesting a join between the two tables is not the only (or even necessarily the best way) to do this, if it is possible to just work with store_id

inside the employee table, so I would be open to other approaches. Can always join him.

Since I am very new to SQL, I would like to have some theory or additional explanation that can help me understand how it works.

+3

sql join greatest-n-per-group sql-limit postgresql

CMoney June 10. At 22:06

source to share

2 answers

The classic way to do this would be with a window function , for example rank

:

SELECT employee_name, store_name
FROM   (SELECT employee_name, store_name, 
        RANK() OVER (PARTITION BY store_name ORDER BY employee_id ASC) AS rk
        FROM   employees e
        JOIN   stores s ON e.store_id = s.store_id) t
WHERE  rk <= 15

+2

Mureinik June 10. 15 at 22:34

source to share

Erwin Brandstetter · Accepted Answer · 2015-06-10T22:42:46+0000

`row_number()`

A general solution to get the first n lines for each group is with a window function row_number()

:

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY store_id ORDER BY employee_id) AS rn
   FROM   employees
   WHERE  currently_employed
   ) e
JOIN   stores s USING (store_id)
WHERE  rn <= 15
ORDER  BY store_id, e.rn;

PARTITION BY

must use store_id

which is guaranteed to be unique (as opposed to store_name

).
Define strings in first employees

and then join for stores

which is cheaper.
To get 15 lines use row_number()

not rank()

(would be the wrong tool for this purpose). While employee_id

unique, the difference is not displayed.

`LATERAL`

An alternative to Postgres 9.3+ , which usually works better with a match index, especially when getting a small allocation from a large table.

What is the difference between LATERAL and a subquery in PostgreSQL?

SELECT s.store_name, e.*
FROM   stores s
, LATERAL (
   SELECT *  -- or just needed columns
   FROM   employees
   WHERE  store_id = s.store_id
   AND    currently_employed
   ORDER  BY employee_id
   LIMIT  15
   ) e
-- WHERE ... possibly select only a few stores
ORDER  BY s.store_name, e.store_id, e.employee_id

The ideal index would be a partial multi-column index like this:

CREATE INDEX ON employees (store_id, employee_id) WHERE  currently_employed

The details depend on the missing details in the question. Example:

Create a unique constraint with zero columns

Both versions exclude stores without current employees. There are ways to get around this if you need it ...

Limit the number of rows in a group from concatenation (not to 1 row)

row_number()

LATERAL

More articles:

`row_number()`

`LATERAL`