MySQL filters native links

We have a table with events (as in a calendar event with start and end times) that are regularly requested:

TABLE event (
  `id` varchar(32) NOT NULL,
  `start` datetime,
  `end` datetime,
  `derivedfrom_id` varchar(32),
  `parent_id` varchar(32) NOT NULL
)

      

  • parent_id

    points to a calendar table that provides some additional information.
  • Some events were created from another event and therefore have a link pointing to that event "originated" through a column derivedfrom_id

    .

When fetching a set of events, we usually query by date ( start

/ end

) and calendar ( parent_id

) and limit the number of results with limit

paging.

The problem we are currently facing: sometimes we need to combine related events for a user into one view. So we make our usual request

SELECT id, start, parent_id
FROM event
WHERE parent_id in (<list of calendars>)
  AND start >= 'some date'
LIMIT x

      

... and then filter out the originating events, because the derivatives have different information and refer to their origin anyway.

As you can see (before we did), we limit ourselves to filtering and thus get a set of events with less power than we expected before, that is, the number of results is less than "x" after filtering.

The only thing I could think of would be to duplicate the query and do a sub-selection:

SELECT id, start, parent_id
FROM event
WHERE parent_id in (<list_of_calendars>)
  AND start >= 'some date'
  AND (/* the part below duplicates the previous conditions */
        derivedfrom_id is not null
        or id not in (
          SELECT derivedfrom_id
          FROM event
          WHERE parent_id in (<list_of_calendars>)
            AND start >= 'some date'
            AND derivedfrom_id is not null
        )
      )
LIMIT x

      

But I hardly believe that this is the only way to do it. Moreover, our request is much more complicated.

Is there a better way?


Sample data

(as pointed out in the comment)

Considering these three events:

*ID**DERIVEDFROM_ID**PARENT_ID**START*
├──────┼──────────────────┼─────────────┼─────────────────
│ 100  │ -                │ A           │ 2014-11-18 15:00
│ 101  │ 100              │ B           │ 2014-11-18 15:00
│ 150  │ -                │ A           │ 2014-11-20 08:00

      

... and the limit is 2, I want to get events 101 and 150.

Instead, with the current approach:

  • A request with a limit of 2 leads to events 100 and 101
  • After filtering event 100 is discarded and the only event remaining is 101

A note about the expected response

The SQL above is actually generated from a Java application using JPA. My current solution is to create a where clause and duplicate it. If there is anything general JPA-specific I would appreciate any pointers.

+3


source to share


5 answers


Try the following:

SELECT e.*
FROM `event` e            # 'e' from 'event'
  LEFT JOIN `event` d     # 'd' from 'derived'; `LEFT JOIN` gets ALL entries from `e`
    ON e.id = d.derivedfrom_id    # match an event `e` with all those `d` derived from it
WHERE d.id IS NULL        # keep only events `e` without derived events `d`
;

      



LEFT JOIN

selects all events from e

and associates them with the events d

that are derived from them. It provides all the records from e

which can be selected, regardless of whether they were received events or not. The clause WHERE

only stores events from e

that have no derived events. It retains derived events as well as originating events that do not have derived events, but cuts out those originating events that have derived events.

Add additional conditions WHERE

to the fields of the table e

as you wish, use the sentence LIMIT

, mix well, serve cold.

+4


source


I suggest grouping events by their DERIVEDFROM_ID or - if it's not a derived event, their ID using the MySQL method IFNULL

, see SELECT one column if the other is null

SELECT id, start, parent_id, text, IFNULL(derivedfrom_id, id) as grouper
FROM event
WHERE parent_id in (<list_of_calendars>)
    AND start >= '<some date>'
GROUP BY grouper
LIMIT <x>

      

This, however, will randomly return a source or derived event. If you only want to receive derived events, you will have to sort the results by ID before grouping (assuming the IDs are ascending and the derived events thus have higher IDs than their ancestor). Since it is not possible to run ORDER BY

before GROUP BY

in MySQL, you will have to go to the inner join ( MySQL in order before the group on ):

SELECT e1.* FROM event e1
INNER JOIN
(
    SELECT max(id) maxId, IFNULL(derivedfrom_id, id) as grouper
    FROM event
    WHERE parent_id in (<list_of_calendars>)
        AND start >= '<some date>'
    GROUP BY grouper
) e2
on e1.id = e2.maxId
LIMIT <x>

      



edit: As Aaron pointed out, the ascending ids assumption is contrary to the given data structure. Assuming there is a timestamp created

, you can use a query like this:

SELECT e1.* FROM event e1
INNER JOIN
(
    SELECT max(created) c, IFNULL(derivedfrom_id, id) grouper
    FROM event
    WHERE parent_id IN (<list_of_calendars>)
        AND start >= '<some date>'
    GROUP BY grouper
) e2
ON (e1.id = e2.grouper AND e1.created = c) OR (e1.derivedfrom_id = e2.grouper AND e1.created = c)
LIMIT <x>

      

SQL Fiddle

+3


source


Look for something like this ::

Select a.id, a.start, a.parent_id from 
event a , event b
Where a.parent_id in (<list_of_calendars>)
And a.start >= 'some date'
And b.parent_id = a.parent_id
And b.start = a.start
And a.id != b.derivedfrom_id
Limit x

      

0


source


to omit those events that received events in the result set, you can test each id, omit it or not, or join a derived id table to exclude

join:

SELECT id, start, parent_id 
  FROM event
  LEFT JOIN (
    SELECT DISTINCT derived_id AS id FROM event
     WHERE start >= 'some date' AND parent_id IN (<calendars>)
  ) omit
    ON omit.id = event.id
 WHERE parent_id IN (<calendars>)
   AND start >= 'some date'
   AND omit.id IS NULL
 LIMIT x

      

nested selection: efficient enough if index_id is indexed

SELECT e.id, e.start, e.parent_id
  FROM event e
  WHERE parent_id IN (<calendars>)
    AND start >= 'some date'
    AND (SELECT e2.id FROM event e2      /* and does not have derived events */
          WHERE e2.derived_id = e.id
            AND e2.start >= 'some date'
          LIMIT 1) IS NULL
  LIMIT x

      

in mysql you cannot check for negation, you need to create an exception list and omit explicitly

Since parent_id (calendar) can change, all its selections must be checked. The start check should not be duplicated if we can assume that a derived event cannot occur prior to its original event.

Note that you are referring to filtering the originating event (ID 100 because it received event 101), but I think your nested selection example is filtering the derived event.

0


source


Assuming that the value parent_id

in the derivative string matches the value parent_id

in the origin string and that the value start

in the derivative string is guaranteed no earlier than start

the parent string ... (These are assumptions, because I don't believe this was specified) .. . then ...

One quick solution would be to add the predicate " NOT EXISTS

" to an existing query. We just assigned an alias to the table reference in the original query (for example e

) and then add to the WHERE ...

   AND NOT EXISTS (SELECT 1 FROM event d WHERE d.derivedfrom_id = e.id)

      

To explain this a little ... for the string "origin", there will be a matching string "derived" in the subquery, and when that string is found, the string "origin" will be excluded from the result set.

Back to these assumptions ... if we have no guarantee of a match parent_id

on the string "origin" and "derivative" ... and / or we have no guarantee about start

, then we would need to repeat the corresponding predicates (in parent_id

and start

) in the correlated subquery to to check if the string "derived" is returned or not, adding predicates makes the query more complicated:

   AND NOT EXISTS ( SELECT 1
                      FROM event d
                     WHERE d.derivedfrom_id = e.id 
                       AND d.parent_id IN parent_id IN (<list of calendars>)
                       AND d.start > 'some date' 
                  )

      


Sometimes we can get better performance by rewriting the query to replace it with an NOT EXISTS

equivalent "anti-join" pattern.

To describe this, it is an "outer join" to find matching "derived" strings and then filter out rows that had at least one matching "derived" string.

Personally, I think the form is NOT EXISTS

more intuitive, the anti-join pattern is a little confusing. The advantage of anti-joining is better performance (in some cases).

As an example of an anti-join pattern, I would rewrite the query something like this:

SELECT e.id
     , e.start
     , e.parent_id
  FROM event e
  LEFT
  JOIN event d
    ON d.derivedfrom_id = e.id
   AND d.parent_id IN (<list of calendars>)
   AND d.start >= 'some date'
 WHERE d.derivedfrom_id IS NULL
   AND e.parent_id IN (<list of calendars>)
   AND e.start >= 'some date'
 ORDER BY e.id
 LIMIT x

      

To unpack this operation a bit, the operation LEFT [OUTER] JOIN

finds matching "derived" strings, which return strings from e

that have matching "derived" strings, as well as strings from e

that have no match. The "trick" is the condition IS NULL

for a column that is guaranteed not to be NULL when a matching derived row is found, so that the predicate will exclude rows that match.

(I also added an ORDER BY clause to make the result more deterministic.)

0


source







All Articles