Limit the number of rows in a group from concatenation (not to 1 row)

Given these tables:

TABLE Stores (
 store_id INT,
 store_name VARCHAR,
 etc
);

TABLE Employees (
 employee_id INT,
 store_id INT,
 employee_name VARCHAR,
 currently_employed BOOLEAN,
 etc
);

      

I want to list the 15 most able-bodied employees for each store (let say 15 the lowest employee_id

), or all of the employees for the store, if their 15 currently_employed='t'

. I want to do this with a connection offer.

I found many examples of people doing this for only 1 line, usually min or max (one longest employee), but I want to basically concatenate ORDER BY

and LIMIT

inside a join. Some of these examples can be found here:

I also found some decent examples for this shopping mall (I don't, I have about 5000 stores):

I also saw that you can use TOP

instead of ORDER BY

and LIMIT

, but not for PostgreSQL.

I believe that suggesting a join between the two tables is not the only (or even necessarily the best way) to do this, if it is possible to just work with store_id

inside the employee table, so I would be open to other approaches. Can always join him.

Since I am very new to SQL, I would like to have some theory or additional explanation that can help me understand how it works.

+3


source to share


2 answers


row_number()

A general solution to get the first n lines for each group is with a window function row_number()

:

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY store_id ORDER BY employee_id) AS rn
   FROM   employees
   WHERE  currently_employed
   ) e
JOIN   stores s USING (store_id)
WHERE  rn <= 15
ORDER  BY store_id, e.rn;

      

  • PARTITION BY

    must use store_id

    which is guaranteed to be unique (as opposed to store_name

    ).

  • Define strings in first employees

    and then join for stores

    which is cheaper.

  • To get 15 lines use row_number()

    not rank()

    (would be the wrong tool for this purpose). While employee_id

    unique, the difference is not displayed.

LATERAL

An alternative to Postgres 9.3+ , which usually works better with a match index, especially when getting a small allocation from a large table.



SELECT s.store_name, e.*
FROM   stores s
, LATERAL (
   SELECT *  -- or just needed columns
   FROM   employees
   WHERE  store_id = s.store_id
   AND    currently_employed
   ORDER  BY employee_id
   LIMIT  15
   ) e
-- WHERE ... possibly select only a few stores
ORDER  BY s.store_name, e.store_id, e.employee_id

      

The ideal index would be a partial multi-column index like this:

CREATE INDEX ON employees (store_id, employee_id) WHERE  currently_employed

      

The details depend on the missing details in the question. Example:

Both versions exclude stores without current employees. There are ways to get around this if you need it ...

+4


source


The classic way to do this would be with a window function , for example rank

:



SELECT employee_name, store_name
FROM   (SELECT employee_name, store_name, 
        RANK() OVER (PARTITION BY store_name ORDER BY employee_id ASC) AS rk
        FROM   employees e
        JOIN   stores s ON e.store_id = s.store_id) t
WHERE  rk <= 15

      

+2


source







All Articles