Minimum number of lines in a group
Below is my request:
SELECT *
FROM (
SELECT f.max, f.min, p.user_id, p.id, p.title, p.rating,
RANK() OVER (
PARTITION BY p.user_id
ORDER BY p.rating DESC, p.id DESC
) AS rnk
FROM posts AS p
INNER JOIN friends AS f ON (p.user_id = f.friend_id)
WHERE f.user_id=1
) AS subq
WHERE (subq.rnk <= subq.max)
LIMIT 10
It looks for messages from my friends, sorted by rating and date. The window feature implemented in this query allows me to limit the number of rows returned for each friend according to the field MAX
in the table Friends
.
However, I also have a field MIN
that is used to specify the minimum number of messages I want to receive from a request for a given friend. How is this possible?
I also wonder if SQL is the best option for these types of queries? I've already tried the Neo4j Graph database and while it seemed like a good solution, I would rather avoid using two separate databases.
Scheme:
CREATE TABLE friends(
user_id int,
friend_id int,
min int,
max int
);
CREATE TABLE posts(
id int,
title varchar(255),
rating int,
date date,
user_id int
);
Let's assume we have the following data:
INSERT INTO friends VALUES
(1,2,1,3)
, (1,3,0,5)
, (1,4,2,10);
INSERT INTO posts VALUES
(1, 'posts1', 2, now(), 2)
, (2, 'posts2', 1, now(), 2)
, (3, 'posts3', 5, now(), 2)
, (4, 'posts4', 2, now(), 2)
, (5, 'posts5', 11, now(), 2)
, (6, 'posts6', 7, now(), 2)
, (7, 'posts7', 3, now(), 2)
, (8, 'posts8', 4, now(), 3)
, (9, 'posts9', 1, now(), 3)
, (10, 'posts10', 0, now(), 3)
, (11, 'posts11', 7, now(), 3)
, (12, 'posts12', 3, now(), 3)
, (13, 'posts13', 2, now(), 3)
, (14, 'posts14', 4, now(), 4)
, (15, 'posts15', 9, now(), 4)
, (16, 'posts16', 0, now(), 4)
, (17, 'posts17', 3, now(), 4)
, (18, 'posts18', 2, now(), 4)
, (19, 'posts19', 1, now(), 4)
, (20, 'posts20', 2, now(), 4);
Hence, I would like to see combinations (post_id, title, rating, date, friend_id)
with the following conditions, if possible:
- between 1 and 3 messages from a friend with
id
= 2 - between 0 and 5 messages from a friend with
id
= 3 - from 2 to 10 messages from a friend with
id
= 4
So, basically, if a friend of mine has friend_id=2
published 1 or more articles, I need at least 2 of them. If he has published more than 3 articles, I want no more than 3.
source to share
Let's say I want you to have 2-5 messages from you every day if you post this a lot. If you only post one, everything is fine and I will only have one post.
Your explanation in the comment still doesn't add up. Your number min
will sound without effect according to this explanation.
This is not what you wrote, but it would make sense:
Given the max impressions for posts (external LIMIT
), I want to get posts min
from each friend first (if any). If there are free slots after that, fill up max
messages for a friend.
In the example, which will be 1 ( min
) message from friend 2 with the highest priority and 2 more ( max - min
) messages if the number of slots is still available.
It would be arbitrary for messages to make the cut if there are not enough slots for each priority. I went ahead and assumed that the first post from each should be selected first, etc.
The rest is still arbitrary, but can be easily resolved if you can manage to formulate the requirement.
SELECT *
FROM friends f
, LATERAL (
SELECT *
, row_number() OVER (ORDER BY rating DESC NULLS LAST, id DESC) AS rn
FROM posts p
WHERE user_id = f.friend_id -- LATERAL reference
ORDER BY rating DESC NULLS LAST, date DESC NULLS LAST
LIMIT f.max -- LATERAL reference
) p
WHERE f.user_id = 1
ORDER BY (p.rn > f.min) -- minimum posts from each first
, p.rn
LIMIT 10; -- arbitrary total maximum
Notes
-
Assuming
friends.user_id
uposts.id
are the primary keys. Your table definition is not there. -
All other columns must be defined
NOT NULL
to make sense. -
Use
LATERAL
join to select onlymax
transactions for a friend in a subquery: -
Use , not in a subquery. It is a common mistake to confuse both.
row_number()
rank()
-
You mentioned
date
, but it was not shown in your request. Maybe you really want:, row_number() OVER (ORDER BY rating DESC NULLS LAST , date DESC NULLS LAST) AS rn
-
DESC NULLS LAST
just becauserating
udate
can be NULL: -
In Postgres, you can use a simple boolean expression in
ORDER BY
:ORDER BY (p.rn > f.min), p.rn
- SQL selects the order of queries by day and month
- Sort null values after all other but special values
Which puts
min
messages for a friend first. The second element (p.rn
) gives each friend an equal chance (first post first, etc.). -
Do not use as an identifier. It is a reserved word in standard SQL and the base type name in Postgres.
date
source to share
Am I missing something? Can you add and subq.rnk >= subq.min
to the proposal where
?
SELECT *
FROM (SELECT f.max, f.min, p.friend_id, p.id, p.title, p.rating
row_number() over (partition by p.user_id
order by p.rating DESC, p.id DESC
) as rnk
FROM posts p INNER JOIN
friends f
ON p.friend_id = f.user_id
WHERE f.user_id = 1
) subq
WHERE subq.rnk <= subq.max and subq.rnk >= subq.min
source to share
I think from a neo4j / cypher perspective, this is really what you want to do ...
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
match p-[:PUBLISHED]-(i:Item)
with u, p, r, i
order by i.name
return u.name, p.name, i.name
skip 5
limit 2
You will just need to parameterize min and max and link them at runtime, which will involve two requests, not one, but I think this is still an elegant solution. I've tried including properties in skip
and limit
before, but cypher obviously doesn't support this (yet). He wants a parameter or unsigned integer.
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
match p-[:PUBLISHED]-(i:Item)
with u, p, r, i
order by i.name
return u.name, p.name, i.name
skip {min}
limit {max}
source to share