Which approach is better from a database performance standpoint — one table group by operation or multiple table serializable transactions?

Question

Which approach is better from a database performance standpoint — one table group by operation or multiple table serializable transactions?

I am developing a redundancy system based on SQL database (no limitation to specific DB).

Scheme:

Workflow:

I now use a special view "actual_reservations" based on the cusom-group with filtering logic to identify the actually hosted reservations, such as:

CREATE VIEW `reserv-io`.`actual_reserved_resources` AS
  SELECT
    `r`.`id`,
    `a`.`resource_id`,
    `a`.`type_id`,
    `a`.`status_id`,
    `a`.`reservation_start`,
    `a`.`reservation_end`
  FROM `reserv-io`.`actions` `a`
    JOIN (SELECT
            `r`.`id`      `id`,
            MAX(`a`.`id`) `action_id`
          FROM `reserv-io`.`reservations` `r`
            JOIN `reserv-io`.`actions` `a` ON `r`.`id` = `a`.`reservation_id`
          WHERE (`r`.`id` NOT IN
                 (SELECT `r`.`id` `id`
                  FROM `reserv-io`.`reservations` `r`
                    JOIN `reserv-io`.`actions` `a` ON `r`.`id` = `a`.`reservation_id`
                  WHERE (`a`.`status_id` IN
                         (SELECT `id`
                          FROM `reserv-io`.`reservation_statuses`
                          WHERE `name` = 'Canceled')))
                 AND `a`.`status_id` IN
                     (SELECT `id`
                      FROM `reserv-io`.`reservation_statuses`
                      WHERE `name` = 'Approved' OR `name` = 'Accepted'))
          GROUP BY `r`.`id`) `b`
      ON `a`.`id` = `b`.`action_id`
    JOIN `reserv-io`.`reservations` `r` ON `r`.`id` = `a`.`reservation_id`;

Run code Hide result

And to determine if there is any overlap with other caveats with a procedure like this:

CREATE PROCEDURE HAS_OVERLAPPING_RESERVATION_WITH_TYPE_SELF_CHECK(
  IN  RESERVATION      BIGINT,
  IN  RESOURCE         INT,
  IN  RESERVATION_TYPE INT,
  IN  STARTS_AT        DATETIME,
  IN  ENDS_AT          DATETIME,
  OUT RESULT           BIT)
  BEGIN
    SELECT CASE WHEN EXISTS(
        SELECT *
        FROM actual_reserved_resources r
        WHERE r.resource_id = RESOURCE
              AND r.type_id = RESERVATION_TYPE
              AND r.reservation_start < ENDS_AT
              AND r.reservation_end > STARTS_AT
              AND r.id <> RESERVATION)
      THEN TRUE
           ELSE FALSE END
    INTO RESULT;
  END$$

Run code Hide result

I already have good indexes on the "actions" table, but I'm not sure if my single-table approach for all backup actions really scales. I've heard of an approach where I store all pending and actually approved reservations in separate tables and synchronize them manually in a transaction.

My question is which approach from your POV is better in terms of database performance, scalability, and good data design?

+3

sql architecture database-design database-performance

Vladislav Trofimov 02 june 17 at 7:01

source to share

2 answers

You do not specify the size of your dataset or the size of the system on which your queries run. Until you get large datasets, it probably doesn't matter much. My personal experience with large datasets is that joins of more than two tables can be very slow when the system starts swapping out to disk rather than keeping things in memory. On Oracle many years ago, I had a special query that had to touch six tables. One request took ~ six hours. Since the separate queries were against two tables at a time, this took 15 minutes. I had another query in MySQL that touched a number of tables. The breakdown into smaller requests reduced the total runtime from seven hours to seven minutes.

0

Kevin Postlewaite 02 june 17 at 22:01

source to share

Mike nakis · Accepted Answer · 2017-06-02T07:14:41+0000

I would suggest that:

Don't take anyone answering here for granted; instead, comparing real data to measure actual performance.
Whatever the answer, it won't matter as much as you fear. This, of course, will not differentiate between scalable and non-scalable systems.
If it turns out that there actually is some merit in some sort of pending and approving record segregation, you can achieve this with line splitting (check it out) so you can save everything to one logical table and avoid messing up your schema in performance name.

Which approach is better from a database performance standpoint — one table group by operation or multiple table serializable transactions?

More articles: