Update table with random fields

I am working on a "data obfuscator" script. I want to update every record in the table with fake data. I have a table with locations and trying to select a random record from that table to update the records in my personal table. So the SQL is like this

UPDATE Person
SET    City = (SELECT TOP 1 City
               FROM   z.CityStateZip c2
               ORDER  BY Newid()) 

      

The problem is that he only chooses one city, rather than choosing a random City for each person. I have also tried

(SELECT TOP 1 City FROM z.CityStateZip c2 ORDER BY NEWID()), PersonId, from Person

      

But it still only picks one city (I thought the sub-queries were done once for each entry) instead of what I want is a random city for each entry.

I've also tried combining with the same results, only one city ever picked ....

SELECT t.City,
       PersonId
FROM   Person
       INNER JOIN (SELECT TOP 1 City
                   FROM   z.CityStateZip c2
                   ORDER  BY Newid()) t
               ON 1 = 1 

      

I tried to include this statement in a function, but SQL Server won't let me use it NEWID()

inside a function.

Answer

I modified Giorgi's answer and the answer to a related question and came up with this, it's super fast! I changed how I randomly selected a city. The NewId () order was the problem. So a person has 5k records and CityStateZip has ~ 30K, I took it from 40 seconds to 4 ... (now even faster without count subquery)

DECLARE @count bigint 
SELECT @count = count(*) from z.CityStateZip

UPDATE p
SET p.City= b.City
FROM Person p
CROSS APPLY (SELECT TOP 1 City -- if Id is unique, top 1 shouldn't be necessary
             FROM z.CityStateZip 
             WHERE p.SomeKey = p.SomeKey and -- ... the magic! ↓↓↓
             Id = (Select ABS(Checksum(NewID()) % @count))) b

      

+3


source to share


1 answer


You have to force the database engine to evaluate a new value for each row. You can do this by adding a dummy where where element to the external table, for example:

DECLARE @city TABLE(ID INT IDENTITY(1, 1), City VARCHAR(100))

INSERT INTO @city VALUES
('Dallas'),
('New York'),
('Washington'),
('Las Vegas')


DECLARE @random TABLE(ID INT IDENTITY(1, 1), City VARCHAR(100))

INSERT INTO @random VALUES
('Manchester'),
('London'),
('Oxford'),
('Liverpool')


SELECT * FROM @city c
CROSS APPLY(SELECT TOP 1 * FROM @random r WHERE c.ID = c.ID ORDER BY NEWID()) ca

      



if you remove WHERE c.ID = c.ID

you get the same value for all rows.

+1


source







All Articles