Left outer join at two levels deep in Postgres results in cartesian product

Considering the following 4 tables:

CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )

      

I am trying to get a list of all events, as well as the number of any members and the amount of any donations. The problem is that the donated amount is returned incorrectly (represented by the Cartesian result of donations * # event_members).

Here is a SQL query (Postgres)

SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM            events
LEFT OUTER JOIN profiles      ON events.id = profiles.event_id
LEFT OUTER JOIN donations     ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name

      

Amount (donations.amount) returns = actual donation amount * number of rows in event_members. If I comment out the count (separate event_members.id) and event_members left the outer join, the amount is correct.

Edit: Erwin pointed me in the right direction. The request is rewritten as:

SELECT events.name, COUNT (DISTINCT event_members.id), 
  select (SUM (donations.amount) from donations, profiles where donations.profile_id = profiles.id and profiles.event_id = events.id) as total_donations
    FROM events
    LEFT OUTER JOIN event_members ON event_members.event_id = events.id
    GROUP BY events.name
+3


source to share


4 answers


It seems that you have two independent structs ( -[

means 1-N

):

events -[ profiles -[ donations
events -[ event members

      



I wrapped the second one in a subquery:

SELECT events.name,
  member_count.the_member_count
  COUNT(DISTINCT event_members.id),
  SUM(donations.amount)

FROM            events
LEFT OUTER JOIN profiles      ON events.id = profiles.event_id
LEFT OUTER JOIN donations     ON donations.profile_id = profiles.id

LEFT OUTER JOIN (
  SELECT
    event_id,
    COUNT(*) AS the_member_count
  FROM event_members
  GROUP BY event_id
) AS member_count
  ON member_count.event_id = events.id

GROUP BY events.name

      

+2


source


As I explained in detail on the above question , you need to aggregate first and then join tables to avoid the proxy CROSS JOIN

. For example:

SELECT e.name, e.sum_donations, m.ct_members
FROM (
    SELECT e.id, e.name, SUM(d.amount) AS sum_donations
    FROM   events             e
    LEFT   JOIN profiles      p ON p.event_id = e.id
    LEFT   JOIN donations     d ON d.profile_id = p.id
    GROUP  BY 1, 2
    ) e
LEFT   JOIN (
    SELECT event_id, COUNT(DISTINCT id) AS ct_members
    FROM   event_members
    GROUP  BY 1
    ) m ON m.event_id = e.id

      

IF event_members.id

is the primary key (as one would guess), you can simplify



COUNT(*) AS ct_members

      

as id

guaranteed UNIQUE NOT NULL

. It's a little faster.

+4


source


Of course, you get a Cartesian product between donations and events for each event, since both are only associated with the event, there is no join relationship between donations and events other than the event id, which of course means that each term corresponds to every donation.

+1


source


When you execute your request, you are requesting all events - let's say there are two, the Alpha event and the Beta event, then JOIN members. Let's say that there is a member Alice who participates in both events.

SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM            events
LEFT OUTER JOIN profiles      ON events.id = profiles.event_id
LEFT OUTER JOIN donations     ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name

      

On each line, you have specified the total amount for Alice's donations. If Alice donated $ 100, you asked for:

Alpha  Alice  100USD
Beta   Alice  100USD

      

So it's no surprise that when asking for the total, Alice stands for donating $ 200.

If you want the sum of all donations, you are better off with two different requests. Trying to do everything with a single query, if possible, would be the classic SQL Antipattern (actually the one in Chapter 18, "Spaghetti Query"):

Unintentional foods

One common consequence of producing all of your results in one query is the Cartesian product. This happens when two of the table in the query do not have conditions restricting their relationship. Without this limitation, joining two pairs of tables, every row in the first table refers to every row in the other table. Each such pairing becomes a row in the result set, and you end up with many more rows that you would expect.

0


source







All Articles