Difference between social network requests
This is source from the Querys self-test of the DB class at Stanford Coursera:
High school students in their hometown decided to organize a social network using databases. So far, they have collected information on sixteen students in four grades, 9-12. Here's the Schematic:
Highschooler (ID, name, grade)
English: There is a high school student with a unique ID and given name in a specific class.
Friend (ID1, ID2)
Russian: Student with ID1 is friends with student with ID2. Friendship is mutual, therefore, if (123, 456) is in the "Friend" table, (456, 123).
Likes (ID1, ID2)
English: student with ID1 loves student with ID2. Liking someone is not necessarily mutual, so if (123, 456) is in the Likes table, there is no guarantee that (456, 123) is also present.
Database here
Hint: For every student A who likes student B where the two are not friends, find out if they have a mutual friend C (who can introduce them!). For all such trios, return name and grade A, B, and C.
I have a query that generates correct results and one that generates incorrect results, but I cannot logically figure out why the query below is incorrect. I have highlighted the problem causing the line, but I am having a hard time understanding it and would like to get a quick overview of the code.
Working request due to proposal and b.id not in (select id2 from friend where a.id=id1)
:
select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- this next line
and b.id not in (select id2 from friend where a.id=id1)
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)
Invalid request due to suggestion and b.id <>friend.id2
:
select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- as opposed to this one
and b.id <>friend.id2
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)
Query result 1:
Andrew 10 Cassandra 9 Gabriel 9
Austin 11 Jordan 12 Andrew 10
Austin 11 Jordan 12 Kyle 12
Query result 2:
Andrew 10 Cassandra 9 Gabriel 9
Brittany 10 Chris 10 Haley 10
Austin 11 Jordan 12 Andrew 10
Austin 11 Jordan 12 Kyle 12
Gabriel 11 Alexis 11 Jessica 11
source to share
These 2 conditions have very different meanings:
--1
and b.id not in (select id2 from friend where a.id=id1)
--2
and b.id <> friend.id2
Let's say that b.id
there is jack
, and select id2 from friend where a.id=id1
:
id2
---
alice
bob
In the first query for a row jack
in a table b
, you actually have the condition "jack is not one of (alice, bob)". Which gives true, so the line will be included in the output.
In the second query for a row jack
in a table, b
you are effectively trying to join the table friend
where id2
not jack
. In our example, there are 2 such records, one with alice
and one with bob
. Thus, in the output you will get 2 records for jack
, one for alice
and one for bob
.
In the first request, you either get a record with jack
from b
or you don't. In the second query, you get n
records for jack
, where n
is the number of records where id2
not jack
. If all records jack
are then you will not get records from b
. Completely different things.
source to share
Your problems seem to be mostly related to using ANSI-89 join
instead of using modern explicit join
s. If you were using modern join
it would be much easier to troubleshoot.
Your first version does what is now known as inner join
, while your second version looks for "not equal". This is why you get different results.
source to share