SQL Delete low counts
I have a table with this data:
Id Qty
-- ---
A 1
A 2
A 3
B 112
B 125
B 109
But I should only have maximum values ββfor each id. The maximum value for A is 3 and for B it is 125. How can I highlight (and remove) other values?
The final table should look like this:
Id Qty
-- ---
A 3
B 125
Starting MySQL 4.1
source to share
Oh wait. Got a simpler solution: I will select all maximum values ββ(group by id), export data, clear the table, reimport only maximum values.
CREATE TABLE tabletemp LIKE table;
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;
DROP TABLE table;
RENAME TABLE tabletemp TO table;
Thanks everyone!
source to share
Try this in SQL Server:
delete from tbl o
left outer join
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null
Version 2 for MySQL ... Can anyone check this out ?:
delete from tbl o
where concat(id,qty) not in
(select concat(id,anz) from (Select max(qty) anz , id
from tbl i
group by i.id))
Explanation:
Since I had to not use joins (see comments on MySQL support on connect and delete / update / insert), I moved the subquery to the IN (a, b, c) clause.
Inside the In clause, I can use a subquery, but this query is only allowed to return one field. So to filter out all the elements that are not maximum I need to combine both fields into one, so I can return it inside the in clause. So basically my query inside IN only returns the largest ID + QTY. To compare it to the main table I also need to do the concat outside, so the data for both fields is the same.
Basically the In clause contains: ("A3", "B125")
Disclaimer: The above request is "evil!" as it uses the (concat) function on the fields to compare. This will make any index on those fields almost useless. You should never formulate a request in such a way that it runs on a regular basis. I just wanted to try and bend it so it works on mysql.
An example of this "bad design": (Get all o in the last 2 weeks) select ... from orders where orderday + 14> now ()
You should always: select ... from orders where orderday> now () - 14
The difference is subtle: version 2 only has to do the math and can use the index, while version 1 has to do the math for every single row in the order table. You can forget about using the index ...
source to share
I would try this:
delete from T
where exists (
select * from T as T2
where T2.Id = T.Id
and T2.Qty > T.Qty
);
For those who may have a similar question in the future, this may be confirmed one day (this is now in SQL Server 2005 and later)
It does not require a join and it has advantages over using a temporary table if the table has dependencies
with Tranked(Id,Qty,rk) as (
select
Id, Qty,
rank() over (
partition by Id
order by Qty desc
)
from T
)
delete from Tranked
where rk > 1;
source to share
You will need to go through another table (among other things, making a single delete statement here completely impossible in mysql, you cannot delete from a table and use the same table in a subquery).
BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where
the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;
source to share
MySQL 4.0 and later supports simple multi-table syntax for DELETE
:
DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;
This creates a concatenation of each row with the given ID to all other rows with the same ID, and only removes the row with the smaller one qty
in each pairing. After all this is done, the row with the largest qty
per group from id
will not be deleted.
If you only have one line with a given ID, it still works because one line naturally has the most value.
FWIW, I just tried my solution using MySQL 5.0.75 on MacBook Pro 2.40GHz. I have inserted 1 million rows of synthetic data with varying number of rows per "group":
- 2 lines per
id
complete in 26.78 sec. - 5 lines per
id
complete in 43.18 sec. - 10 lines per
id
complete in 1 minute 3.77 sec. - 100 lines per
id
complete in 6 minutes 46.60 sec. - 1000 lines were
id
not completed before I completed it.
source to share