Removing duplicate row without unique identifier
I have some data in a table that looks something like this:
table stockData
(
tickId int not null,
timestamp datetime not null,
price decimal(18,5) not null
)
Neither tickId nor timestamp are unique, however the combination of tickId and timestamp must be unique.
I have some duplicate data in my table and I am trying to delete them. However, I come to the conclusion that there is not enough data with data for me to distinguish one row from another, and basically I was unable to delete only one of the duplicate rows. I am guessing that I will need to introduce some sort of ids column that will help me identify one row from another.
Is this correct, or is there some magical way to remove one but not two duplicate data with a query?
EDIT Edited to clarify that the tickId and timestamp combo must be unique, but this is not due to duplicate data.
source to share
Here is a query that removes duplicates and leaves exactly one copy of each unique row. It will work with SQL Server 2005 or higher:
WITH Dups AS
(
SELECT tickId, timestamp, price,
ROW_NUMBER() OVER(PARTITION BY tickid, timestamp ORDER BY (SELECT 0)) AS rn
FROM stockData
)
DELETE FROM Dups WHERE rn > 1
source to share
I may not be understanding your question correctly, but if it is guaranteed to have "tickId" and "timestamp" specified, then how do you have duplicate data in your table? Could you give an example or two of what you mean?
However, if you have duplicates of all three columns inside a table, the following script might work. Test this and make a database backup before implementing as I just built it.
declare @x table
(
tickId int not null,
timestamp datetime not null,
price decimal(18,5) not null
)
insert into @x (tickId, timestamp, price)
select tickId,
timestamp,
price
from stockData
group by tickId,
timestamp,
price
having count(*) > 1
union
select tickId,
timestamp,
price
from stockData
group by tickId,
timestamp,
price
having count(*) = 1
delete
from stockData
insert into stockData (tickId, timestamp, price)
select tickId,
timestamp,
price
from @x
alter table stockData add constraint
pk_StockData primary key clustered (tickid, timestamp)
source to share