SQL - find all instances where two columns are the same
So, I have a simple table containing comments
from user
that belongs to a specific blog post
.
id | user | post_id | comment
----------------------------------------------------------
0 | john@test.com | 1001 | great article
1 | bob@test.com | 1001 | nice post
2 | john@test.com | 1002 | I agree
3 | john@test.com | 1001 | thats cool
4 | bob@test.com | 1002 | thanks for sharing
5 | bob@test.com | 1002 | really helpful
6 | steve@test.com | 1001 | spam post about pills
I want to get all instances in which a user commented on the same post twice (which means the same user
one post_id
). In this case, I would return:
id | user | post_id | comment
----------------------------------------------------------
0 | john@test.com | 1001 | great article
3 | john@test.com | 1001 | thats cool
4 | bob@test.com | 1002 | thanks for sharing
5 | bob@test.com | 1002 | really helpful
I thought DISTINCT
this was what I needed, but it just gives me unique rows.
source to share
DISTINCT
removes all duplicate lines, so you get unique lines.
You can try using CROSS JOIN
(available as in Hive 0.10 according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins ):
SELECT mt.*
FROM MYTABLE mt
CROSS JOIN MYTABLE mt2
WHERE mt.user = mt2.user
AND mt.post_id = mt2.post_id
Performance may not be the best. If you want to sort it, use SORT BY
or ORDER BY
.
source to share
DECLARE @MyTable TABLE (id int, usr varchar(50), post_id int, comment varchar(50))
INSERT @MyTable (id, usr, post_id, comment) VALUES (0,'john@test.com',1001,'great article')
INSERT @MyTable (id, usr, post_id, comment) VALUES (1,'bob@test.com',1001,'nice post')
INSERT @MyTable (id, usr, post_id, comment) VALUES (3,'john@test.com',1002,'I agree')
INSERT @MyTable (id, usr, post_id, comment) VALUES (4,'john@test.com',1001,'thats cool')
INSERT @MyTable (id, usr, post_id, comment) VALUES (5,'bob@test.com',1002,'thanks for sharing')
INSERT @MyTable (id, usr, post_id, comment) VALUES (6,'bob@test.com',1002,'really helpful')
INSERT @MyTable (id, usr, post_id, comment) VALUES (7,'steve@test.com',1001,'spam post about pills')
SELECT
T1.id,
T1.usr,
T1.post_id,
T1.comment
FROM
@MyTable T1
INNER JOIN @MyTable T2
ON T1.usr = T2.usr AND T1.post_id = T2.post_id
GROUP BY
T1.id,
T1.usr,
T1.post_id,
T1.comment
HAVING
Count(T2.id) > 1
source to share