Getting rid of duplicate results in a MySQL query when using UNION

I have a MySQL query to get items that had recent activity. Basically users can post a review or add it to their wishlist and I want to get all the items that either had a new review in the last x days or were put on the wishlist.

The request is a bit like this (slightly simplified):

SELECT items.*, reaction.timestamp AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
WHERE reactions.timestamp > 1251806994
GROUP BY items.id

UNION

SELECT items.*, wishlists.timestamp AS date FROM items
LEFT JOIN wishlist ON wishlists.item_id = items.id
WHERE wishlists.timestamp > 1251806994
GROUP BY items.id

ORDER BY date DESC LIMIT 5

      

This works, but when an item has been submitted to both a wishlist and a review has been submitted, the item is returned twice. UNION

usually removes duplicates, but since date

different from two lines, both lines are returned. Is there some way I can tell MySQL to ignore the duplicate row deletion date?

I also tried to do something like this:

SELECT items.*, IF(wishlists.id IS NOT NULL, wishlists.timestamp, reactions.timestamp) AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
LEFT JOIN wishlist ON wishlists.item_id = items.id

WHERE (wishlists.id IS NOT NULL AND wishlists.timestamp > 1251806994) OR
(reactions.id IS NOT NULL AND reactions.timestamp > 1251806994)
GROUP BY items.id

ORDER BY date DESC LIMIT 5

      

But for some reason it turned out to be insanely slow (took about half a minute).

+2


source to share


3 answers


I solved this myself based on larryb82's idea. I basically did the following:

SELECT * FROM (
    SELECT items.*, reaction.timestamp AS date FROM items
    LEFT JOIN reactions ON reactions.item_id = items.id
    WHERE reactions.timestamp > 1251806994
    GROUP BY items.id

    UNION

    SELECT items.*, wishlists.timestamp AS date FROM items
    LEFT JOIN wishlist ON wishlists.item_id = items.id
    WHERE wishlists.timestamp > 1251806994
    GROUP BY items.id

    ORDER BY date DESC LIMIT 5
) AS items

GROUP BY items.id
ORDER BY date DESC LIMIT 5

      



While I realize this probably doesn't account for which date is the highest for each item ... Not sure if that matters, and if so, what to do about it.

+5


source


Not sure if this will be a huge success, but you can try



SELECT item_field_1, item_field_2, ..., max(date) as date
FROM
  (the query you posted) 
GROUP BY item_field_1, item_field_2, ...

      

+1


source


I don't think you need UNION here at all.


SELECT item.*, GREATEST(COALESCE(wishlists.timestamp, 0), COALESCE(reaction.timestamp, 0)) as date
FROM items
LEFT JOIN reactions ON reactions.item_id = items.id AND reactions.timestamp > 1251806994
LEFT JOIN wishlists ON wishlists.item_id = items.id AND wishlists.timestamp > 1251806994
ORDER BY date DESC limit 5

      

Your use of the LEFT JOIN above was probably very slow due to the predicate with an OR in it. You asked the database to join three tables together and then parse that result for timestamp information. My statement should make up a smaller intermediate table. Items that have neither a reaction nor a wishlist will receive a date of 0, which apparently won't cause them to post.

+1


source







All Articles