Customers who ordered "X" and did not order "Y" (MySQL)

Let's assume there is a database with these tables:

Orders

OrderId  -  CustomerId
----------------------
100      -  1
101      -  2
102      -  1

      

Order_Details

DetailId  -  OrderId -  ProductId
---------------------------------
1         -  100     -  prod1
2         -  100     -  prod2
3         -  101     -  prod1
4         -  102     -  prod3

      

I need to get a list of CustomerIds who bought product "1" but never bought product "2".

With the provided sample data, I should get CustomerId 2.

I wrote this SQL clause, but the real database is really big and very slow ... Is there a way to improve my suggestion?

SELECT DISTINCT(o.CustomerId) 
FROM orders o 
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
AND o.customerId NOT IN
  (SELECT DISTINCT(o.CustomerId) FROM order_details od2 WHERE
  od2.productId = 'prod2');

      

Thank!

+3


source to share


3 answers


The use of the operator is IN

not recommended for a large list of values. You can use EXISTS

or add more joins to your query instead .

Using Exists:

SELECT DISTINCT(o.CustomerId) 
FROM orders o 
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
AND NOT EXISTS
(
    SELECT 1 
    FROM orders o2 
    JOIN order_details od2 ON od2.orderId = o2.orderId
    WHERE o2.customerId = o.customerId 
    AND od2.productId = 'prod2'
);

      



Using unions:

SELECT DISTINCT(o.CustomerId) 
FROM orders o 
INNER JOIN order_details od ON od.orderId = o.orderId
LEFT JOIN  order_details od2 ON od2.orderId = o.orderId AND od2.productId = 'prod1'
WHERE od.productId = 'prod1'
AND od2.DetailId IS NULL -- Assuming that the detailId column does not allow null values.

      

Note: The code is written right here and is not tested, I may have made some mistakes. To get better quality answers, I suggest to always include DDL + DML on sampled data in your request so that people can validate the answers before posting them.

+2


source


You can convert the engine sub-query

at exists

https://dev.mysql.com/doc/refman/5.5/en/subquery-optimization-with-exists.html

SELECT DISTINCT o.CustomerId
FROM orders o 
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
and not exists(
 select 1 from order_details od2 
 WHERE o.customerId = od2.customerId
 and od2.productId = 'prod2'
)

      

Now you also need indexing for this.

Assuming that orderId

is the primary key on orders

and is already indexed on order_details

, you need



alter table order_details add index productId_idx(productId);

      

If orderId

not indexed on order_details

, add the following

alter table order_details add index orderId_idx(orderId);

      

+2


source


Try using MINUS

SELECT DISTINCT(o.CustomerId) 
FROM orders o 
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'

MINUS

SELECT DISTINCT(o2.CustomerId) 
FROM orders o2 
JOIN order_details od2 ON od2.orderId = o2.orderId
WHERE o2.customerId = o.customerId 
AND od2.productId = 'prod2'

      

+2


source







All Articles