The problem with DISTINCT!

Here is my request:

SELECT 
DISTINCT `c`.`user_id`,
`c`.`created_at`,
`c`.`body`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`,
`u`.`username`,
`u`.`avatar_path` 
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id 
WHERE (c.profile_id = 1) ORDER BY `u`.`id` DESC;

      

It works. The problem is that this word is DISTINCT. As I understand it, it should only select one row for c.user_id.

But I get even 4-5 rows with the same c.user_id column. Where is the problem?

+2


source to share


5 answers


in fact, DISTINCT is not limited to a single column, basically when you say:

SELECT DISTINCT a, b



What you say is "give me a great value for a and b combined" ... just like a UNIQUE index with multiple columns

+9


source


different ensures that ALL values ​​in your select clause are unique, not just user_id. If you want to limit the results to individual user_id, you should group the user_id.

Maybe you want:



SELECT 
`c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count` 
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id 
WHERE (c.profile_id = 1) 
GROUP BY `c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`
ORDER BY `u`.`id` DESC;

      

+6


source


DISTINCT works at the row level, not just the column level

If you want DISTiNCT to consist of only one column, you will need to fill in the rest of the columns (MIN, MAX, SUM, AVG, etc.)

SELECT DISTINCT (Name), Min (ID)
From MyTable

      

+2


source


Distinct will try to return only unique rows, it will not return only 1 row for each user id in your example.

http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html

+1


source


You misunderstand. The DISTINCT modifier is applied to the entire row β€” it states that no two identical ROWS will be returned in the result set.

Look at your SQL, what value of the multiple available ones do you expect to see in the created_at column (for example)? It would be impossible to predict the results of the query as written.

Also, you are using profile_comments twice in your SELECT. It looks like you are trying to count how many times each user has commented. If so, then you should use an AGGREGATE query, grouped by user_id and including only those columns that uniquely identify the user, along with COUNT comments:

SELECT user_id, COUNT (*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id

You can add a connection to users to get the username if you want, but logically your result set cannot include other columns from profile_comments and still only produce one row per user_id, unless those columns are also aggregated in some way:

SELECT user_id, MIN (created_at) AS Earliest, MAX (created_at) AS Latest, COUNT (*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id

+1


source







All Articles