How to update each row of a table with a random row from another table

I am creating my first script identity id and am running into problems with my approach.

I have a table dbo.pseudonyms

, a column is firstname

filled with 200 rows of data. Each row in this 200 row column has a value (there are none). This table also has a column id

(int, primary key, not empty) numbered 1-200.

What I want to do is, in one of the statements, repopulate my entire table USERS

with firstname

data randomly selected for each row from the table pseudonyms

.

To generate a random number for the selection, I use ABS(Checksum(NewId())) % 200

. Every time I do SELECT ABS(Checksum(NewId())) % 200

, I get a numeric value in the range I am looking for, just fine with no intermittent behavior.

HOWEVER, when I use this formula in the following expression:

SELECT pn.firstname 
FROM DeIdentificationData.dbo.pseudonyms pn 
WHERE pn.id = ABS(Checksum(NewId())) % 200

      

I am getting VERY intermittent results. I would say that about 30% of the results return a single name selected from the table (this is the expected result), about 30% return with more than one result (which is unclear, there are no duplicate column values id

), and about 30% with NULL (although firstname

there are no blank lines in the column )

I have been looking for this specific issue for a long time but haven't figured it out yet. I guess the problem is with the use of this formula as a pointer, but I would be at a loss how to do it otherwise.

Thoughts?

+3


source to share


1 answer


Why is your query in the question returning unexpected results

The original request is fetched from Pseudonyms

. The server looks at each row of the table, selects ID

from this row, generates a random number, compares the generated number with ID

.

If a randomly generated number for a particular row is the same as ID

this row, that row is returned in the result set. It is possible that the randomly generated number will never be the same as ID

, and also that the generated number has matched multiple times with ID

.

A little more detail:

  • The server fetches the line with ID=1

    .
  • Generates a random number, say 25

    . Why not? A decent random number.
  • Is there 1 = 25

    ? No => This string is not returned.
  • The server fetches the line with ID=2

    .
  • Generates a random number for example 125

    . Why not? A decent random number.
  • Is there 2 = 125

    ? No => This string is not returned.
  • Etc...

Here is a complete SQL Fiddle solution

Sample data

DECLARE @VarPseudonyms TABLE (ID int IDENTITY(1,1), PseudonymName varchar(50) NOT NULL);
DECLARE @VarUsers TABLE (ID int IDENTITY(1,1), UserName varchar(50) NOT NULL);

INSERT INTO @VarUsers (UserName)
SELECT TOP(1000)
    'UserName' AS UserName
FROM sys.all_objects
ORDER BY sys.all_objects.object_id;

INSERT INTO @VarPseudonyms (PseudonymName)
SELECT TOP(200)
    'PseudonymName'+CAST(ROW_NUMBER() OVER(ORDER BY sys.all_objects.object_id) AS varchar) AS PseudonymName
FROM sys.all_objects
ORDER BY sys.all_objects.object_id;

      

The table Users

has 1000 rows with the same UserName

for each row. The table Pseudonyms

has 200 rows with different ones PseudonymNames

:

SELECT * FROM @VarUsers;
ID   UserName
--   --------
1    UserName
2    UserName
...
999  UserName
1000 UserName

SELECT * FROM @VarPseudonyms;
ID   PseudonymName
--   -------------
1    PseudonymName1
2    PseudonymName2
...
199  PseudonymName199
200  PseudonymName200

      

First try



I tried the direct approach first. For each line in, Users

I want to get one random line from Pseudonyms

:

SELECT
    U.ID
    ,U.UserName
    ,CA.PseudonymName
FROM
    @VarUsers AS U
    CROSS APPLY
    (
        SELECT TOP(1)
            P.PseudonymName
        FROM @VarPseudonyms AS P
        ORDER BY CRYPT_GEN_RANDOM(4)
    ) AS CA
;

      

It turns out the optimizer is too smart and this generated a random one, but the same PseudonymName

for each User

, which I didn't expect:

ID   UserName   PseudonymName
1    UserName   PseudonymName181
2    UserName   PseudonymName181
...
999  UserName   PseudonymName181
1000 UserName   PseudonymName181

      

So, I modified this approach a bit and first generated a random number for each line in Users

. I then used the generated number to find Pseudonym

with this ID

for each line in Users

, using CROSS APPLY

.

CTE_Users

has an extra column with a random number from 1 to 200. In CTE_Joined

we select a row from Pseudonyms

for each User

. Finally, the UPDATE

original table Users

.

Final decision

WITH
CTE_Users
AS
(
    SELECT
        U.ID
        ,U.UserName
        ,1 + 200 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS rnd
    FROM @VarUsers AS U
)
,CTE_Joined
AS
(
    SELECT
        CTE_Users.ID
        ,CTE_Users.UserName
        ,CA.PseudonymName
    FROM
        CTE_Users
        CROSS APPLY
        (
            SELECT P.PseudonymName
            FROM @VarPseudonyms AS P
            WHERE P.ID = CAST(CTE_Users.rnd AS int)
        ) AS CA
)
UPDATE CTE_Joined
SET UserName = PseudonymName;

      

results

SELECT * FROM @VarUsers;
ID   UserName
1    PseudonymName41
2    PseudonymName132
3    PseudonymName177
...
998  PseudonymName60
999  PseudonymName141
1000 PseudonymName157

      

SQL Fiddle

+1


source







All Articles