Add conditional counts for multiple columns of one table

Question

Add conditional counts for multiple columns of one table

I'm looking for a "better" way to fulfill a query in which I want to show one player he played previously and the associated win record for each such opponent.

These tables are divided into main things:

create table player (player_id int, username text);
create table match (winner_id int, loser_id int);

insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
                       , (2, 1), (4, 1), (4, 1);

Thus John has a record of 2 wins and 1 loss against Mary; 1 win and 0 losses against Bob; and 3 wins and 2 losses against Alice.

create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);

I am using Postgres 9.4. Something in the back of my head tells me to somehow consider LATERAL

, but I find it difficult to understand the "shape" of such.

Below is the query I am currently using but something is "disconnecting". Please help me find out and improve this.

select p.username as opponent, 
       coalesce(r.won, 0) as won, 
       coalesce(r.lost, 0) as lost
from (
    select m.winner_id, m.loser_id, count(m.*) as won, (
        select t.lost
        from (
            select winner_id, loser_id, count(*) as lost
            from match
            where loser_id = m.winner_id
            and winner_id = m.loser_id
            group by winner_id, loser_id
        ) t 
    )   
    from match m
    where m.winner_id = 1   -- this would be a parameter
    group by m.winner_id, m.loser_id
) r 
join player p on p.player_id = r.loser_id;

This works as expected. Just want to learn some tricks, or better yet, the correct techniques to do the same.

opponent  won  lost
--------  ---  ----
alice     3    2
bob       1    0
mary      2    1

+3

sql join aggregate-functions postgresql

Brian H June 22. 15 at 15:36

source to share

4 answers

Correlated subquery solution:

SELECT *,
       (SELECT COUNT(*) FROM match WHERE loser_id = p.player_id),
       (SELECT COUNT(*) FROM match WHERE winner_id = p.player_id)
FROM dbo.player p WHERE player_id <> 1

Solution c UNION

and conditional aggregation:

SELECT  t.loser_id ,
        SUM(CASE WHEN result = 1 THEN 1 ELSE 0 END) ,
        SUM(CASE WHEN result = -1 THEN 1 ELSE 0 END)
FROM    ( SELECT    * , 1 AS result
          FROM      match
          WHERE     winner_id = 1
          UNION ALL
          SELECT    loser_id , winner_id , -1 AS result
          FROM      match
          WHERE     loser_id = 1
        ) t
GROUP BY t.loser_id

+3

Giorgi nakeuri June 22. 15 at 16:22

source to share

For one "subjective" player, I simply combine the player in both winning and losing roles and sum the wins / losses:

SELECT opponent, SUM(won) as won, SUM(lost) as lost
FROM
(
    select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
    from "match" m
     inner join "player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
    from "match" m
     inner join "player" l on m.loser_id = l.player_id
) x
WHERE me = 1
GROUP BY opponent;

For a set based operation, we can simply join the players into the same derived table:

SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
FROM "player" p
LEFT JOIN
(
    select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
    from "match" m
     inner join "player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
    from "match" m
     inner join "player" l on m.loser_id = l.player_id
) x
on p.player_id = x.me
GROUP BY player, opponent;

SqlFiddles like here

One small dot - index names must be unique - presumably you meant:

create index idx_winners on match(winner_id);
create index idx_losers on match(loser_id);

+2

StuartLC June 22. 15 at 16:31

source to share

Something more readable than my original. Thoughts?

with W as (
    select loser_id as opponent_id,
    count(*) as n
    from match
    where winner_id = 1
    group by loser_id
),
L as (
    select winner_id as opponent_id,
    count(*) as n
    from match
    where loser_id = 1
    group by winner_id
)
select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
where player.player_id != 1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Left Join  (cost=73.78..108.58 rows=1224 width=48)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (winner_id = 1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (loser_id = 1)
   ->  Hash Left Join  (cost=0.07..30.15 rows=1224 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..25.38 rows=1224 width=36)
               Filter: (player_id <> 1)
         ->  Hash  (cost=0.04..0.04 rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
   ->  Hash  (cost=0.04..0.04 rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)

The above performance metric has player_id! = 1. I think I can avoid this by just scanning the connection results, right?

explain with W as (
        select loser_id as opponent_id,
        count(*) as n
        from match
        where winner_id = 1 
        group by loser_id
    ),  
    L as (
        select winner_id as opponent_id,
        count(*) as n
        from match
        where loser_id = 1 
        group by winner_id
    )   
    select t.* from (
        select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
        from player
        left join W on W.opponent_id = player.player_id
        left join L on L.opponent_id = player.player_id
    ) t 
    where t.player_id != 1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Left Join  (cost=73.78..74.89 rows=3 width=52)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (winner_id = 1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (loser_id = 1)
   ->  Hash Left Join  (cost=0.07..1.15 rows=3 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..1.05 rows=3 width=36)
               Filter: (player_id <> 1)
         ->  Hash  (cost=0.04..0.04 rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
   ->  Hash  (cost=0.04..0.04 rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)

0

Brian H June 22. 15 at 16:50

source to share

Erwin Brandstetter · Accepted Answer · 2015-06-23T01:24:20+0000

Query

The request is not as simple as it looks at first. The shortest query string does not necessarily give the best performance. This should be as short as possible as fast as possible:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;

The result is exactly as requested:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL Fiddle - with more revealing test data!

The key feature is between the two subqueries for losses and gains. This creates a table of all the players our candidate played against. A join clause conveniently combines two columns into one. FULL [OUTER] JOIN

USING

player_id

After that one JOIN

up player

to get the name, and replace NULL with 0. Voilá. COALESCE

Index

It will be even faster with two multi-column indexes :

CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser  on match (loser_id, winner_id);

Only if you get indexing from just that. Then Postgres doesn't even visit the table match

and you get super fast results.

With two columns, integer

you hit the local optimum: these indexes are the same size as the simple ones. Details:

Is a composite index useful for queries on the first field?

Shorter but slower

You can run correlated subqueries like @Giorgi only working correctly:

SELECT *
FROM  (
   SELECT username
       , (SELECT count(*) FROM match
          WHERE  loser_id  = p.player_id
          AND    winner_id = 1) AS won
       , (SELECT count(*) FROM match
          WHERE  winner_id = p.player_id
          AND    loser_id  = 1) AS lost
   FROM   player p
   WHERE  player_id <> 1
   ) sub
WHERE (won > 0 OR lost > 0)
ORDER  BY username;

Works great for small tables, but doesn't scale. This requires a sequential scan for player

and two indices match

for each player. Compare performance with EXPLAIN ANALYZE

.

Add conditional counts for multiple columns of one table

Query

Index

Shorter but slower

More articles: