SQL Delete low counts

I have a table with this data:

Id     Qty  
--     ---  
A       1  
A       2  
A       3  
B       112  
B       125  
B       109  

      

But I should only have maximum values ​​for each id. The maximum value for A is 3 and for B it is 125. How can I highlight (and remove) other values?

The final table should look like this:

Id     Qty  
--     ---   
A       3  
B       125  

      

Starting MySQL 4.1

+2


source to share


5 answers


Oh wait. Got a simpler solution: I will select all maximum values ​​(group by id), export data, clear the table, reimport only maximum values.

CREATE TABLE tabletemp LIKE table;  
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;  
DROP TABLE table;
RENAME TABLE tabletemp TO table; 

      



Thanks everyone!

+2


source


Try this in SQL Server:

delete from tbl o
left outer join 
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null

      

Version 2 for MySQL ... Can anyone check this out ?:

delete from tbl o
where concat(id,qty) not in 
    (select concat(id,anz) from (Select max(qty) anz , id
    from tbl i
    group by i.id)) 

      

Explanation:

Since I had to not use joins (see comments on MySQL support on connect and delete / update / insert), I moved the subquery to the IN (a, b, c) clause.



Inside the In clause, I can use a subquery, but this query is only allowed to return one field. So to filter out all the elements that are not maximum I need to combine both fields into one, so I can return it inside the in clause. So basically my query inside IN only returns the largest ID + QTY. To compare it to the main table I also need to do the concat outside, so the data for both fields is the same.

Basically the In clause contains: ("A3", "B125")

Disclaimer: The above request is "evil!" as it uses the (concat) function on the fields to compare. This will make any index on those fields almost useless. You should never formulate a request in such a way that it runs on a regular basis. I just wanted to try and bend it so it works on mysql.

An example of this "bad design": (Get all o in the last 2 weeks) select ... from orders where orderday + 14> now ()

You should always: select ... from orders where orderday> now () - 14

The difference is subtle: version 2 only has to do the math and can use the index, while version 1 has to do the math for every single row in the order table. You can forget about using the index ...

+1


source


I would try this:

delete from T
where exists (
  select * from T as T2
  where T2.Id = T.Id
  and T2.Qty > T.Qty
);

      

For those who may have a similar question in the future, this may be confirmed one day (this is now in SQL Server 2005 and later)

It does not require a join and it has advantages over using a temporary table if the table has dependencies

with Tranked(Id,Qty,rk) as (
  select
    Id, Qty,
    rank() over (
      partition by Id
      order by Qty desc
    )
  from T
)
  delete from Tranked
  where rk > 1;

      

0


source


You will need to go through another table (among other things, making a single delete statement here completely impossible in mysql, you cannot delete from a table and use the same table in a subquery).

BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where 
  the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;

      

0


source


MySQL 4.0 and later supports simple multi-table syntax for DELETE

:

DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;

      

This creates a concatenation of each row with the given ID to all other rows with the same ID, and only removes the row with the smaller one qty

in each pairing. After all this is done, the row with the largest qty

per group from id

will not be deleted.

If you only have one line with a given ID, it still works because one line naturally has the most value.


FWIW, I just tried my solution using MySQL 5.0.75 on MacBook Pro 2.40GHz. I have inserted 1 million rows of synthetic data with varying number of rows per "group":

  • 2 lines per id

    complete in 26.78 sec.
  • 5 lines per id

    complete in 43.18 sec.
  • 10 lines per id

    complete in 1 minute 3.77 sec.
  • 100 lines per id

    complete in 6 minutes 46.60 sec.
  • 1000 lines were id

    not completed before I completed it.
-1


source







All Articles