Which approach is better from a database performance standpoint — one table group by operation or multiple table serializable transactions?
I am developing a redundancy system based on SQL database (no limitation to specific DB).
I now use a special view "actual_reservations" based on the cusom-group with filtering logic to identify the actually hosted reservations, such as:
CREATE VIEW `reserv-io`.`actual_reserved_resources` AS SELECT `r`.`id`, `a`.`resource_id`, `a`.`type_id`, `a`.`status_id`, `a`.`reservation_start`, `a`.`reservation_end` FROM `reserv-io`.`actions` `a` JOIN (SELECT `r`.`id` `id`, MAX(`a`.`id`) `action_id` FROM `reserv-io`.`reservations` `r` JOIN `reserv-io`.`actions` `a` ON `r`.`id` = `a`.`reservation_id` WHERE (`r`.`id` NOT IN (SELECT `r`.`id` `id` FROM `reserv-io`.`reservations` `r` JOIN `reserv-io`.`actions` `a` ON `r`.`id` = `a`.`reservation_id` WHERE (`a`.`status_id` IN (SELECT `id` FROM `reserv-io`.`reservation_statuses` WHERE `name` = 'Canceled'))) AND `a`.`status_id` IN (SELECT `id` FROM `reserv-io`.`reservation_statuses` WHERE `name` = 'Approved' OR `name` = 'Accepted')) GROUP BY `r`.`id`) `b` ON `a`.`id` = `b`.`action_id` JOIN `reserv-io`.`reservations` `r` ON `r`.`id` = `a`.`reservation_id`;
And to determine if there is any overlap with other caveats with a procedure like this:
CREATE PROCEDURE HAS_OVERLAPPING_RESERVATION_WITH_TYPE_SELF_CHECK(
IN RESERVATION BIGINT,
IN RESOURCE INT,
IN RESERVATION_TYPE INT,
IN STARTS_AT DATETIME,
IN ENDS_AT DATETIME,
OUT RESULT BIT)
BEGIN
SELECT CASE WHEN EXISTS(
SELECT *
FROM actual_reserved_resources r
WHERE r.resource_id = RESOURCE
AND r.type_id = RESERVATION_TYPE
AND r.reservation_start < ENDS_AT
AND r.reservation_end > STARTS_AT
AND r.id <> RESERVATION)
THEN TRUE
ELSE FALSE END
INTO RESULT;
END$$
I already have good indexes on the "actions" table, but I'm not sure if my single-table approach for all backup actions really scales. I've heard of an approach where I store all pending and actually approved reservations in separate tables and synchronize them manually in a transaction.
My question is which approach from your POV is better in terms of database performance, scalability, and good data design?
source to share
I would suggest that:
-
Don't take anyone answering here for granted; instead, comparing real data to measure actual performance.
-
Whatever the answer, it won't matter as much as you fear. This, of course, will not differentiate between scalable and non-scalable systems.
-
If it turns out that there actually is some merit in some sort of pending and approving record segregation, you can achieve this with line splitting (check it out) so you can save everything to one logical table and avoid messing up your schema in performance name.
source to share
You do not specify the size of your dataset or the size of the system on which your queries run. Until you get large datasets, it probably doesn't matter much. My personal experience with large datasets is that joins of more than two tables can be very slow when the system starts swapping out to disk rather than keeping things in memory. On Oracle many years ago, I had a special query that had to touch six tables. One request took ~ six hours. Since the separate queries were against two tables at a time, this took 15 minutes. I had another query in MySQL that touched a number of tables. The breakdown into smaller requests reduced the total runtime from seven hours to seven minutes.
source to share