PostgreSQL: select nearest rows according to sort order

Question

PostgreSQL: select nearest rows according to sort order

I have a table like this:

     a    |  user_id
----------+-------------
  0.1133  |  2312882332
  4.3293  |  7876123213
  3.1133  |  2312332332
  1.3293  |  7876543213
  0.0033  |  2312222332
  5.3293  |  5344343213
  3.2133  |  4122331112
  2.3293  |  9999942333

And I want to search for a specific row - 1.3293 | 7876543213

for example - and select the next 4 rows. 2 above, 2 below, if possible.
Sort order . ORDER BY a ASC

In this case, I will get:

  0.0033  |  2312222332
  0.1133  |  2312882332
  2.3293  |  9999942333
  3.1133  |  2312332332

How can I achieve this using PostgreSQL? (I'm using PHP by the way.)

PS: for the last or first line, the next lines will be 4 higher or lower.

+3

sql-order-by postgresql rows selection window-functions

user1282226 24 Mar 12 at 13:38

source to share

3 answers

And one more:

WITH prec_rows AS
  (SELECT a,
          user_id,
          ROW_NUMBER() OVER (ORDER BY a DESC) AS rn
   FROM tbl
   WHERE a < 1.3293
   ORDER BY a DESC LIMIT 4),
     succ_rows AS
  (SELECT a,
          user_id,
          ROW_NUMBER() OVER (ORDER BY a ASC) AS rn
   FROM tbl
   WHERE a > 1.3293
   ORDER BY a ASC LIMIT 4)
SELECT a, user_id
FROM
  (SELECT a,
          user_id,
          rn
   FROM prec_rows
   UNION ALL SELECT a,
                    user_id,
                    rn
   FROM succ_rows) AS s
ORDER BY rn, a LIMIT 4;

AFAIR WITH

will instantiate the memory table, so the focus of this solution is to limit its size as much as possible (in this case, eight rows).

+2

Tim Landscheidt March 25 12 at 1:56

source to share

set search_path='tmp';

DROP TABLE lutser;
CREATE TABLE lutser
        ( val float
        , num bigint
        );
INSERT INTO lutser(val, num)
VALUES ( 0.1133  ,  2312882332  )
      ,( 4.3293  ,  7876123213  )
      ,( 3.1133  ,  2312332332  )
      ,( 1.3293  ,  7876543213  )
      ,( 0.0033  ,  2312222332  )
      ,( 5.3293  ,  5344343213  )
      ,( 3.2133  ,  4122331112  )
      ,( 2.3293  ,  9999942333  )
        ;

WITH ranked_lutsers AS (
        SELECT val, num
        ,rank() OVER (ORDER BY val) AS rnk
        FROM lutser
        )
SELECT that.val, that.num
        , (that.rnk-this.rnk) AS relrnk
FROM ranked_lutsers that
JOIN ranked_lutsers this ON (that.rnk BETWEEN this.rnk-2 AND this.rnk+2)
WHERE this.val = 1.3293
        ;

Results:

DROP TABLE
CREATE TABLE
INSERT 0 8
  val   |    num     | relrnk 
--------+------------+--------
 0.0033 | 2312222332 |     -2
 0.1133 | 2312882332 |     -1
 1.3293 | 7876543213 |      0
 2.3293 | 9999942333 |      1
 3.1133 | 2312332332 |      2
(5 rows)

As Erwin pointed out, the center line is not needed in the output. Also, row_number () should be used instead of rank ().

WITH ranked_lutsers AS (
        SELECT val, num
        -- ,rank() OVER (ORDER BY val) AS rnk
        , row_number() OVER (ORDER BY val, num) AS rnk
        FROM lutser
) SELECT that.val, that.num
        , (that.rnk-this.rnk) AS relrnk
FROM ranked_lutsers that
JOIN ranked_lutsers this ON (that.rnk BETWEEN this.rnk-2 AND this.rnk+2 )
WHERE this.val = 1.3293
AND that.rnk <> this.rnk
        ;

Result2:

  val   |    num     | relrnk 
--------+------------+--------
 0.0033 | 2312222332 |     -2
 0.1133 | 2312882332 |     -1
 2.3293 | 9999942333 |      1
 3.1133 | 2312332332 |      2
(4 rows)

UPDATE2: always select four, even if we are at the top or bottom of the list. This makes the request a little ugly. (but not as ugly as Erwin;)

WITH ranked_lutsers AS (
        SELECT val, num
        -- ,rank() OVER (ORDER BY val) AS rnk
        , row_number() OVER (ORDER BY val, num) AS rnk
        FROM lutser
) SELECT that.val, that.num
        , ABS(that.rnk-this.rnk) AS srtrnk
        , (that.rnk-this.rnk) AS relrnk
FROM ranked_lutsers that
JOIN ranked_lutsers this ON (that.rnk BETWEEN this.rnk-4 AND this.rnk+4 )
-- WHERE this.val = 1.3293
WHERE this.val = 0.1133
AND that.rnk <> this.rnk
ORDER BY srtrnk ASC
LIMIT 4
        ;

Output:

  val   |    num     | srtrnk | relrnk 
--------+------------+--------+--------
 0.0033 | 2312222332 |      1 |     -1
 1.3293 | 7876543213 |      1 |      1
 2.3293 | 9999942333 |      2 |      2
 3.1133 | 2312332332 |      3 |      3
(4 rows)

UPDATE: version with nested CTE (with outer join !!!). For convenience, I've added a primary key to the table, which sounds like a good idea anyway IMHO.

WITH distance AS (
        WITH ranked_lutsers AS (
        SELECT id
        , row_number() OVER (ORDER BY val, num) AS rnk
        FROM lutser
        ) SELECT l0.id AS one
        ,l1.id AS two
        , ABS(l1.rnk-l0.rnk) AS dist
        -- Warning: Cartesian product below
        FROM ranked_lutsers l0
        , ranked_lutsers l1 WHERE l0.id <> l1.id

        )
SELECT lu.*
FROM lutser lu
JOIN distance di
ON lu.id = di.two
WHERE di.one= 1
ORDER by di.dist
LIMIT 4 
        ;

0

wildplasser 24 Mar 12 at 14:29

source to share

Erwin Brandstetter · Accepted Answer · 2012-03-24T14:29:07+0000

Test version:

CREATE TEMP TABLE tbl(a float, user_id bigint);
INSERT INTO tbl VALUES
 (0.1133, 2312882332)
,(4.3293, 7876123213)
,(3.1133, 2312332332)
,(1.3293, 7876543213)
,(0.0033, 2312222332)
,(5.3293, 5344343213)
,(3.2133, 4122331112)
,(2.3293, 9999942333);

Query:

WITH x AS (
    SELECT a
          ,user_id
          ,row_number() OVER (ORDER BY a, user_id) AS rn
    FROM   tbl
    ), y AS (
    SELECT rn, LEAST(rn - 3, (SELECT max(rn) - 5 FROM x)) AS min_rn
    FROM   x
    WHERE  (a, user_id) = (1.3293, 7876543213)
    )
SELECT *
FROM   x, y
WHERE  x.rn  > y.min_rn
AND    x.rn <> y.rn
ORDER  BY x.a, x.user_id
LIMIT  4;

Returns the result as shown in the question. Assuming that (a, user_id)

is the only one.

It is unclear if a

unique. This is why I am sorting user_id

extra to break ties. This is why I am using the window functionrow_number()

, not rank()

for this. row_number()

is the right tool in any case. We need 4 lines. rank()

will provide undefined number of lines if the sort order was equal.

This always returns 4 rows if the table has at least 5 rows. Close to the first / last line, the first / last 4 lines are returned. Two lines before / after in all other cases. The criteria string itself is excluded.

Improved performance

This is an improved version of what @Tim Landscheidt posted. Vote for his answer if you like the index idea. Do not interfere with small tables. But it will improve performance for large tables - with a suitable index. The best choice would be a multi-column index on (a, user_id)

.

WITH params(_a, _user_id) AS (SELECT 5.3293, 5344343213) -- enter params once
    ,x AS  (
    (
    SELECT a
          ,user_id
          ,row_number() OVER (ORDER BY a DESC, user_id DESC) AS rn
    FROM   tbl, params p
    WHERE  a < p._a
       OR  a = p._a AND user_id < p._user_id -- a is not defined unique
    ORDER  BY a DESC, user_id DESC
    LIMIT  5  -- 4 + 1: including central row
    )
    UNION ALL -- UNION right away, trim one query level
    (
    SELECT a
          ,user_id
          ,row_number() OVER (ORDER BY a ASC, user_id ASC) AS rn
    FROM   tbl, params p
    WHERE  a > p._a
       OR  a = p._a AND user_id > p._user_id
    ORDER  BY a ASC, user_id ASC
    LIMIT  5
    )
    )
    , y AS (
    SELECT a, user_id
    FROM   x, params p
    WHERE (a, user_id) <> (p._a, p._user_id) -- exclude central row
    ORDER  BY rn  -- no need to ORDER BY a
    LIMIT  4
    )
SELECT *
FROM   y
ORDER  BY a, user_id   -- ORDER result as requested

The main differences from @Tim's version:

In accordance with the question, (a, user_id)

form the search criteria, not just a

. This changes the window frame ORDER BY

and the WHERE

suggestion is different.
UNION

immediately, there is no need for an additional level of queries. You need to copy around two UNION queries to resolve separate ones ORDER BY

.
Sort the result by request. Requires a different level of query (with little or no cost).
Since parameters are used in multiple places, I centralized the input in the master CTE.
For reuse, you can wrap this query almost as is in a SQL or plpgsql function.

PostgreSQL: select nearest rows according to sort order

Test version:

Query:

Improved performance

The main differences from @Tim's version:

More articles: