Removing duplicate row without unique identifier

Question

Removing duplicate row without unique identifier

I have some data in a table that looks something like this:

table stockData
(
tickId int not null,
timestamp datetime not null,
price decimal(18,5) not null
)

Neither tickId nor timestamp are unique, however the combination of tickId and timestamp must be unique.

I have some duplicate data in my table and I am trying to delete them. However, I come to the conclusion that there is not enough data with data for me to distinguish one row from another, and basically I was unable to delete only one of the duplicate rows. I am guessing that I will need to introduce some sort of ids column that will help me identify one row from another.

Is this correct, or is there some magical way to remove one but not two duplicate data with a query?

EDIT Edited to clarify that the tickId and timestamp combo must be unique, but this is not due to duplicate data.

+3

sql sql-server

steve8918 March 25 12 at 18:05

source to share

3 answers

select distinct * into temp_table from source_table

(this table will be created for you)

remove from temp_table (which you don't need)

insert into sorce_table
select * from temp_table

+3

Jester March 25 12 at 18:10

source to share

I may not be understanding your question correctly, but if it is guaranteed to have "tickId" and "timestamp" specified, then how do you have duplicate data in your table? Could you give an example or two of what you mean?

However, if you have duplicates of all three columns inside a table, the following script might work. Test this and make a database backup before implementing as I just built it.

declare @x table 
(
    tickId int not null,
    timestamp datetime not null,
    price decimal(18,5) not null
)

insert into @x (tickId, timestamp, price)
select tickId,
    timestamp,
    price
from stockData
group by tickId,
         timestamp,
         price
having count(*) > 1

union 

select tickId,
       timestamp,
       price
from stockData
group by tickId,
         timestamp,
         price
having count(*) = 1

delete 
from stockData

insert into stockData (tickId, timestamp, price)
select tickId,
       timestamp,
       price
from @x

alter table stockData add constraint
    pk_StockData primary key clustered (tickid, timestamp)

0

NuNn DaDdY March 25 12 at 18:11

source to share

Sergey Rybalkin · Accepted Answer · 2012-03-25T18:14:48+0000

Here is a query that removes duplicates and leaves exactly one copy of each unique row. It will work with SQL Server 2005 or higher:

WITH Dups AS
(
  SELECT tickId, timestamp, price,
    ROW_NUMBER() OVER(PARTITION BY tickid, timestamp ORDER BY (SELECT 0)) AS rn
  FROM stockData
)
DELETE FROM Dups WHERE rn > 1

Removing duplicate row without unique identifier

More articles: