Request three disjoint datasets

Question

Request three disjoint datasets

I am retrieving three different datasets (or what should be "unique" strings). In general, I expect 3 different unique rowsets because I have to perform different operations on each dataset. However, I am getting more rows than the overall table, which means I have to look for duplicate rows somewhere. Here's an example of my three sets of queries:

SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID = t1.ID
            AND t2.NAME = t1.NAME
            AND t2.ADDRESS <> t1.ADDRESS


SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID = t1.ID
            AND t2.NAME <> t1.NAME
            AND t2.ADDRESS <> t1.ADDRESS


SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID <> t1.ID
            AND t2.NAME = t1.NAME
            AND t2.ADDRESS <> t1.ADDRESS

As you can see, I choose (in order of requests)

A dataset where ID and name match
Dataset where the identifier matches but the name is NOT
A dataset in which the identifier does not match, but has a name

I am fetching MORE rows than there are in T1 when summing the number of results returned from all three queries which I suppose are logically impossible, plus that means I have to duplicate rows (if logically possible) somewhere that gets in the way I have to execute different commands against each set (since a different command will be executed on the line).

Can anyone find where I went wrong here?

0

sql sql-server

Organiccat Dec 24. 08:24 PM

source to share

5 answers

Are you sure the NAME and ID are unique in both tables?

If not, you may have a situation, for example, where table 1 has the following:

NAME: Fred ID: 1

and table2 has the following:

NAME: Fred ID: 1

NAME: Fred ID: 2

In this case, the record in table 1 will be returned by your two queries: ID and NAME are the same, but NAME is the same, but the identifier is not.

You might be able to narrow down the problem by traversing each combination of the two queries to find out what the duplicates are, for example:

SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID = t1.ID
                AND t2.NAME = t1.NAME
                AND t2.ADDRESS <> t1.ADDRESS
INTERSECT
SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID = t1.ID
                AND t2.NAME <> t1.NAME
                AND t2.ADDRESS <> t1.ADDRESS

+1

Eric Rosenberger Dec 24. 08:34 pm

source to share

Assuming T2.ID has a unique constraint, it still makes sense for this scenario. If for each record in T1 there are two matching records in T2:

Same name, same ID, different address
Same name, different ID, different address

Then the same record for T1 can appear, for example, in the first and third requests.

It is also possible to simultaneously receive the same row in the second and third requests.

If T2.ID is not guaranteed to be unique, you can get the same row from T1 in all three queries.

+1

recursive Dec 24. '08 at 20:57

source to share

I think the last query might be one that fetches an additional rowset.

i.e. It relies on name matching in both tables (not ID)

0

shahkalpesh Dec 24. '08 at 20:37

source to share

To find offensive data (and find your logical hole), I would recommend:

(careful pseudocode)

Limit results to only SELECT id FROM ....

CONNECTING result sets
COUNT (ID)
GROUP BY id
HAVING count (id)> 1

This will show records that match more than one subquery.

0

Chris nava Dec 25. '08 at 12:47

source to share

tvanfosson · Accepted Answer · 2008-12-24T20:55:56+0000

Consider if the name is not unique. If you have the following data:

Table 1                        Table 2
ID    Name      Address        ID    Name      Address
0     Jim Smith 1111 A St      0     Jim Smith 2222 A St
1     Jim Smith 2222 B St      1     Jim Smith 3333 C St

Then request 1 gives you:

0     Jim Smith 1111 A St
1     Jim Smith 2222 B St

Since rows 1 and 2 in table 1 correspond to rows 1 and 2, respectively, in table 2.

Request 2 gives nothing.

Request 3 gives you

0     Jim Smith 1111 A St
1     Jim Smith 2222 B St

Because row 1 in table 1 corresponds to row 2 in table 2, and row 2 in table 1 corresponds to row 1 in table 2. So you get 4 rows from table 1 when there are only 2 rows in it.

Request three disjoint datasets

More articles: