Customers who ordered "X" and did not order "Y" (MySQL)
Let's assume there is a database with these tables:
Orders
OrderId - CustomerId
----------------------
100 - 1
101 - 2
102 - 1
Order_Details
DetailId - OrderId - ProductId
---------------------------------
1 - 100 - prod1
2 - 100 - prod2
3 - 101 - prod1
4 - 102 - prod3
I need to get a list of CustomerIds who bought product "1" but never bought product "2".
With the provided sample data, I should get CustomerId 2.
I wrote this SQL clause, but the real database is really big and very slow ... Is there a way to improve my suggestion?
SELECT DISTINCT(o.CustomerId)
FROM orders o
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
AND o.customerId NOT IN
(SELECT DISTINCT(o.CustomerId) FROM order_details od2 WHERE
od2.productId = 'prod2');
Thank!
source to share
The use of the operator is IN
not recommended for a large list of values. You can use EXISTS
or add more joins to your query instead .
Using Exists:
SELECT DISTINCT(o.CustomerId)
FROM orders o
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
AND NOT EXISTS
(
SELECT 1
FROM orders o2
JOIN order_details od2 ON od2.orderId = o2.orderId
WHERE o2.customerId = o.customerId
AND od2.productId = 'prod2'
);
Using unions:
SELECT DISTINCT(o.CustomerId)
FROM orders o
INNER JOIN order_details od ON od.orderId = o.orderId
LEFT JOIN order_details od2 ON od2.orderId = o.orderId AND od2.productId = 'prod1'
WHERE od.productId = 'prod1'
AND od2.DetailId IS NULL -- Assuming that the detailId column does not allow null values.
Note: The code is written right here and is not tested, I may have made some mistakes. To get better quality answers, I suggest to always include DDL + DML on sampled data in your request so that people can validate the answers before posting them.
source to share
You can convert the engine sub-query
at exists
https://dev.mysql.com/doc/refman/5.5/en/subquery-optimization-with-exists.html
SELECT DISTINCT o.CustomerId
FROM orders o
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
and not exists(
select 1 from order_details od2
WHERE o.customerId = od2.customerId
and od2.productId = 'prod2'
)
Now you also need indexing for this.
Assuming that orderId
is the primary key on orders
and is already indexed on order_details
, you need
alter table order_details add index productId_idx(productId);
If orderId
not indexed on order_details
, add the following
alter table order_details add index orderId_idx(orderId);
source to share
Try using MINUS
SELECT DISTINCT(o.CustomerId)
FROM orders o
JOIN order_details od ON od.orderId = o.orderId
WHERE od.productId = 'prod1'
MINUS
SELECT DISTINCT(o2.CustomerId)
FROM orders o2
JOIN order_details od2 ON od2.orderId = o2.orderId
WHERE o2.customerId = o.customerId
AND od2.productId = 'prod2'
source to share