Difference between social network requests

This is source from the Querys self-test of the DB class at Stanford Coursera:

High school students in their hometown decided to organize a social network using databases. So far, they have collected information on sixteen students in four grades, 9-12. Here's the Schematic:

Highschooler (ID, name, grade) 

      

English: There is a high school student with a unique ID and given name in a specific class.

Friend (ID1, ID2) 

      

Russian: Student with ID1 is friends with student with ID2. Friendship is mutual, therefore, if (123, 456) is in the "Friend" table, (456, 123).

Likes (ID1, ID2) 

      

English: student with ID1 loves student with ID2. Liking someone is not necessarily mutual, so if (123, 456) is in the Likes table, there is no guarantee that (456, 123) is also present.

Database here

Hint: For every student A who likes student B where the two are not friends, find out if they have a mutual friend C (who can introduce them!). For all such trios, return name and grade A, B, and C.

I have a query that generates correct results and one that generates incorrect results, but I cannot logically figure out why the query below is incorrect. I have highlighted the problem causing the line, but I am having a hard time understanding it and would like to get a quick overview of the code.

Working request due to proposal and b.id not in (select id2 from friend where a.id=id1)

:

select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- this next line
and b.id not in (select id2 from friend where a.id=id1)
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)

      

Invalid request due to suggestion and b.id <>friend.id2

:

select distinct a.name, a.grade, b.name, b.grade, c.name, c.grade
from highschooler a, highschooler b, highschooler c, likes, friend
where a.id=likes.id1
and b.id=likes.id2
and a.id=friend.id1
-- as opposed to this one
and b.id <>friend.id2
and b.id in (select id1 from friend where id2=c.id)
and a.id in (select id1 from friend where id2=c.id)

      

Query result 1:

Andrew 10 Cassandra 9 Gabriel 9

Austin 11 Jordan 12 Andrew 10

Austin 11 Jordan 12 Kyle 12

Query result 2:

Andrew 10 Cassandra 9 Gabriel 9

Brittany 10 Chris 10 Haley 10

Austin 11 Jordan 12 Andrew 10

Austin 11 Jordan 12 Kyle 12

Gabriel 11 Alexis 11 Jessica 11

+3


source to share


2 answers


These 2 conditions have very different meanings:

--1
and b.id not in (select id2 from friend where a.id=id1)
--2
and b.id <> friend.id2

      

Let's say that b.id

there is jack

, and select id2 from friend where a.id=id1

:

id2
---
alice
bob

      



In the first query for a row jack

in a table b

, you actually have the condition "jack is not one of (alice, bob)". Which gives true, so the line will be included in the output.

In the second query for a row jack

in a table, b

you are effectively trying to join the table friend

where id2

not jack

. In our example, there are 2 such records, one with alice

and one with bob

. Thus, in the output you will get 2 records for jack

, one for alice

and one for bob

.

In the first request, you either get a record with jack

from b

or you don't. In the second query, you get n

records for jack

, where n

is the number of records where id2

not jack

. If all records jack

are then you will not get records from b

. Completely different things.

+2


source


Your problems seem to be mostly related to using ANSI-89 join

instead of using modern explicit join

s. If you were using modern join

it would be much easier to troubleshoot.



Your first version does what is now known as inner join

, while your second version looks for "not equal". This is why you get different results.

-1


source







All Articles