Removing duplicates from complex MySQL Select Query
All my projects are stored in the com_projects table. Each project can have multiple locations, which are stored in the com_location table. The com_country table is just a list of country names that are used in the com_location table to display the country name.
In the com_project table, I have a category_id field that can range from 1 to 4.
Below is a SELECT query - my attempt at counting the total number of times a category is assigned to a country.
---------------------------------------------------------------
| country | category_1 | category_2 | category_3 | category_4 |
---------------------------------------------------------------
| USA | 20 | 5 | 3 | 0 |
---------------------------------------------------------------
| UK | 1 | 12 | 0 | 0 |
etc....
SELECT b.country_id, c.name,
SUM(case when a.category_id = 1 then 1 else 0 end) as category_1,
SUM(case when a.category_id = 2 then 1 else 0 end) as category_2,
SUM(case when a.category_id = 3 then 1 else 0 end) as category_3,
SUM(case when a.category_id = 4 then 1 else 0 end) as category_4
FROM com_project a
Inner JOIN com_location b
ON a.id = b.project_id
INNER JOIN com_country c
ON c.id = b.country_id
WHERE a.state = 1
AND b.state = 1
GROUP BY b.country_id
I would like the category to be increased if the project has one or more locations in different countries. The problem I have is that many projects have multiple locations in the same country and artificially increase results.
How can I customize the SELECT statement to avoid duplication from one country?
BTW, I tried adding DISTINCT right after SELECT and it didn't help.
I think the suggestion where
will be used to filter projects that are always in the same country:
SELECT l.country_id, c.name,
SUM(p.category_id = 1) as category_1,
SUM(p.category_id = 2) as category_2,
SUM(p.category_id = 3) as category_3,
SUM(p.category_id = 4) as category_4
FROM com_project p Inner JOIN
com_location l
ON p.id = l.project_id INNER JOIN
com_country c
ON c.id = l.country_id
WHERE p.state = 1 AND l.state = 1 AND
EXISTS (SELECT 1
FROM com_location l2
WHERE l2.project_id = l.project_id AND l2.country <> l.country
)
GROUP BY l.country_id, c.name;
This just checks that each aggregated project has at least one more country.