Limit the number of rows in a group from concatenation (not to 1 row)
Given these tables:
TABLE Stores (
store_id INT,
store_name VARCHAR,
etc
);
TABLE Employees (
employee_id INT,
store_id INT,
employee_name VARCHAR,
currently_employed BOOLEAN,
etc
);
I want to list the 15 most able-bodied employees for each store (let say 15 the lowest employee_id
), or all of the employees for the store, if their 15 currently_employed='t'
. I want to do this with a connection offer.
I found many examples of people doing this for only 1 line, usually min or max (one longest employee), but I want to basically concatenate ORDER BY
and LIMIT
inside a join. Some of these examples can be found here:
I also found some decent examples for this shopping mall (I don't, I have about 5000 stores):
I also saw that you can use TOP
instead of ORDER BY
and LIMIT
, but not for PostgreSQL.
I believe that suggesting a join between the two tables is not the only (or even necessarily the best way) to do this, if it is possible to just work with store_id
inside the employee table, so I would be open to other approaches. Can always join him.
Since I am very new to SQL, I would like to have some theory or additional explanation that can help me understand how it works.
source to share
row_number()
A general solution to get the first n lines for each group is with a window function row_number()
:
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY store_id ORDER BY employee_id) AS rn
FROM employees
WHERE currently_employed
) e
JOIN stores s USING (store_id)
WHERE rn <= 15
ORDER BY store_id, e.rn;
-
PARTITION BY
must usestore_id
which is guaranteed to be unique (as opposed tostore_name
). -
Define strings in first
employees
and then join forstores
which is cheaper. -
To get 15 lines use
row_number()
notrank()
(would be the wrong tool for this purpose). Whileemployee_id
unique, the difference is not displayed.
LATERAL
An alternative to Postgres 9.3+ , which usually works better with a match index, especially when getting a small allocation from a large table.
SELECT s.store_name, e.*
FROM stores s
, LATERAL (
SELECT * -- or just needed columns
FROM employees
WHERE store_id = s.store_id
AND currently_employed
ORDER BY employee_id
LIMIT 15
) e
-- WHERE ... possibly select only a few stores
ORDER BY s.store_name, e.store_id, e.employee_id
The ideal index would be a partial multi-column index like this:
CREATE INDEX ON employees (store_id, employee_id) WHERE currently_employed
The details depend on the missing details in the question. Example:
Both versions exclude stores without current employees. There are ways to get around this if you need it ...
source to share
The classic way to do this would be with a window function , for example rank
:
SELECT employee_name, store_name
FROM (SELECT employee_name, store_name,
RANK() OVER (PARTITION BY store_name ORDER BY employee_id ASC) AS rk
FROM employees e
JOIN stores s ON e.store_id = s.store_id) t
WHERE rk <= 15
source to share