Reversing (or canceling) a heavy load on the warehouse fact table

We are currently planning to record a "batch ID" for each batch of facts that we upload. This way we can drop the load in case we find problems.

Should we also track the batch ID in the size series?

Dimension strings seem to have different rules. If we treat them as slowly changing and use one of the SCD algorithms that preserve history, then the reboot really means little.

Typical scenario. Appropriate size, SCD processing. Download facts. Done.

Extension. Appropriate size, SCD processing. Download facts. Find the problem. Delete the batch of facts. Solve the problem of. Reload the facts. Done.

Possible scenario. Appropriate size, SCD processing. Download facts. Find the problem. Delete the batch of facts and dimension lines. Solve the problem of. Appropriate size, SCD processing. Download facts. Done.

Tracking resizing doesn't seem to help much. Any advice on how best to handle "undo" or "rollback" in a data warehouse store?

Our ETL tools are completely Python home applications.

0


source to share


1 answer


From my point of view, as long as you don't abuse your dimensions (e.g. tracking time down to the millisecond), there isn't much gain in tracking dimensions to rollback. Also you can create a tool to clean up unreferenced measurements once a month.



+3


source







All Articles