SQL intersects with group by
Given these two tables / sets with different groups of elements, how can i find which groups in are how can I find groups in set1
cover more than one group in set2
?set1
that cannot be covered by one group in set2
?
eg. for the tables below A (1,2,5)
, the only group that covers s1
( 1 , 2 , 3) and s2
(2,3,4, 5 ). B
and C
are not answers because they are both included in the same group s2
.
I would rather use SQL
( Sql Server 2008 R2
).
Thank.
set1 set2
+---------+----------+ +---------+----------+
| group | item | | group | item |
`````````````````````+ `````````````````````+
| A | 1 | | s1 | 1 |
| A | 2 | | s1 | 2 |
| A | 5 | | s1 | 3 |
| B | 4 | | s2 | 2 |
| B | 5 | | s2 | 3 |
| C | 3 | | s2 | 4 |
| C | 5 | | s2 | 5 |
+---------+----------+ +---------+----------+
Use this sqlfiddle to try it: http://sqlfiddle.com/#!6/fac8a/3
Or use the script below to generate temporary tables to try out the answers:
create table #set1 (grp varchar(5),item int)
create table #set2 (grp varchar(5),item int)
insert into #set1 select 'a',1 union select 'a',2 union select 'a',5 union select 'b',4 union select 'b',5 union select 'c',3 union select 'c',5
insert into #set2 select 's1',1 union select 's1',2 union select 's1',3 union select 's2',2 union select 's2',3 union select 's2',4 union select 's2',5
select * from #set1
select * from #set2
--drop table #set1
--drop table #set2
source to share
Select groups from set1
, for which there are no groups in set2
, for which all elements in set1
exist in set2
:
select s1.grp from set1 s1
where not exists(
select * from set2 s2 where not exists(
select item from set1 s11
where s11.grp = s1.grp
except
select item from set2 s22
where s22.grp = s2.grp))
group by s1.grp
source to share
Ok. It's ugly, but it should work. I tried this in a violin. I think it can be done through the window, but I have to think about it.
Now ugly.
WITH d1 AS (
SELECT set1.grp
, COUNT(*) cnt
FROM set1
GROUP BY set1.grp
), d2 AS (
SELECT set1.grp grp1
, set2.grp grp2
, COUNT(set1.item) cnt
FROM set1
INNER JOIN set2
ON set1.item = set2.item
GROUP BY set1.grp
, set2.grp
)
SELECT grp
FROM d1
EXCEPT
SELECT d1.grp
FROM d1
INNER JOIN d2
ON d2.grp1 = d1.grp
AND d2.cnt = d1.cnt
source to share
Thanks for the comments. I believe the following edited query will work:
Select distinct grp1, initialRows, max(MatchedRows) from
(
select a.grp as grp1, b.grp as grp2
, count(distinct case when b.item is not null then a.item end) as MatchedRows
, d.InitialRows
from set1 a
left join set2 b
on a.item = b.item
left join
(select grp, count(distinct Item) as InitialRows from set1
group by grp) d
on a.grp = d.grp
group by a.grp, b.grp, InitialRows
) c
group by grp1, InitialRows
having max(MatchedRows) < InitialRows
source to share
I think this will be the trick. The subquery returns the set2 groups for the set1 group that match all elements in set1 by counting the matches and comparing the matches count to the number of set1 groups.
select s.grp from #set1 s
group by s.grp
having not exists (
select s2.grp from #set2 s2 inner join #set1 s1 on s2.item = s1.item
where s1.grp = s.grp
group by s2.grp
having count(s.item) = count(s2.item)
)
source to share
Basically the same as Robert Co
I didn't get this from his answer - I came up with it myself
select set1.group
from set1
except
select set1count.group
from ( select set1.group , count(*) as [count]
from set1
) as set1count
join ( select set1.group as [group1], count(*) as [count]
from set1
join set2
on set2.item = set1.item
group by set1.group, set2.group -- this is the magic
) as set1count
on set1count.group = set2count.[group1] -- note no set2.group match
and set1count.count = set12count.count -- the items in set1 are in at least on set2 group
source to share