Request three disjoint datasets
I am retrieving three different datasets (or what should be "unique" strings). In general, I expect 3 different unique rowsets because I have to perform different operations on each dataset. However, I am getting more rows than the overall table, which means I have to look for duplicate rows somewhere. Here's an example of my three sets of queries:
SELECT DISTINCT t1.*
FROM table1 t1
INNER JOIN table2 t2
ON t2.ID = t1.ID
AND t2.NAME = t1.NAME
AND t2.ADDRESS <> t1.ADDRESS
SELECT DISTINCT t1.*
FROM table1 t1
INNER JOIN table2 t2
ON t2.ID = t1.ID
AND t2.NAME <> t1.NAME
AND t2.ADDRESS <> t1.ADDRESS
SELECT DISTINCT t1.*
FROM table1 t1
INNER JOIN table2 t2
ON t2.ID <> t1.ID
AND t2.NAME = t1.NAME
AND t2.ADDRESS <> t1.ADDRESS
As you can see, I choose (in order of requests)
- A dataset where ID and name match
- Dataset where the identifier matches but the name is NOT
- A dataset in which the identifier does not match, but has a name
I am fetching MORE rows than there are in T1 when summing the number of results returned from all three queries which I suppose are logically impossible, plus that means I have to duplicate rows (if logically possible) somewhere that gets in the way I have to execute different commands against each set (since a different command will be executed on the line).
Can anyone find where I went wrong here?
source to share
Consider if the name is not unique. If you have the following data:
Table 1 Table 2
ID Name Address ID Name Address
0 Jim Smith 1111 A St 0 Jim Smith 2222 A St
1 Jim Smith 2222 B St 1 Jim Smith 3333 C St
Then request 1 gives you:
0 Jim Smith 1111 A St
1 Jim Smith 2222 B St
Since rows 1 and 2 in table 1 correspond to rows 1 and 2, respectively, in table 2.
Request 2 gives nothing.
Request 3 gives you
0 Jim Smith 1111 A St
1 Jim Smith 2222 B St
Because row 1 in table 1 corresponds to row 2 in table 2, and row 2 in table 1 corresponds to row 1 in table 2. So you get 4 rows from table 1 when there are only 2 rows in it.
source to share
Are you sure the NAME and ID are unique in both tables?
If not, you may have a situation, for example, where table 1 has the following:
NAME: Fred ID: 1
and table2 has the following:
NAME: Fred ID: 1
NAME: Fred ID: 2
In this case, the record in table 1 will be returned by your two queries: ID and NAME are the same, but NAME is the same, but the identifier is not.
You might be able to narrow down the problem by traversing each combination of the two queries to find out what the duplicates are, for example:
SELECT DISTINCT t1.*
FROM table1 t1
INNER JOIN table2 t2
ON t2.ID = t1.ID
AND t2.NAME = t1.NAME
AND t2.ADDRESS <> t1.ADDRESS
INTERSECT
SELECT DISTINCT t1.*
FROM table1 t1
INNER JOIN table2 t2
ON t2.ID = t1.ID
AND t2.NAME <> t1.NAME
AND t2.ADDRESS <> t1.ADDRESS
source to share
Assuming T2.ID has a unique constraint, it still makes sense for this scenario. If for each record in T1 there are two matching records in T2:
- Same name, same ID, different address
- Same name, different ID, different address
Then the same record for T1 can appear, for example, in the first and third requests.
It is also possible to simultaneously receive the same row in the second and third requests.
If T2.ID is not guaranteed to be unique, you can get the same row from T1 in all three queries.
source to share