Fact table with information that is regularly updated in the source system

I am creating a data warehouse with dimensions and am learning how to model various business processes from my original system in my warehouse.

I am currently modeling the "rate" (rate to work) from our original system in our datastore as a fact table that contains information such as:

  • Bet amount
  • Projected income
  • Sales employee
  • Bet status (active, pending, declined, etc.)
  • and etc.

The problem is that the rate (or most of the other processes I try to simulate) can go through various states and update its information at any time in the source system. According to Ralph Kimball, fact tables should only be updated if they are considered "cumulative snapshots" and I am sure that not all of these processes will be considered "cumulative snapshots" as defined below.

How should these types of data warehouse processes be modeled as recommended by the Kimball team? Moreover, which fact table will work for bidding (given the facts above)?

Extract http://www.kimballgroup.com/2008/11/fact-tables/

The mirrors of the transaction correspond to the dimension taken in one instant. Grocery paper is the seed of the deal. The measured facts are valid only for this moment and for this event. The next measurement event can occur a millisecond later or next month or never. Thus, transactional grain fact tables are unpredictably sparse or dense. We have no guarantee that all possible foreign keys will be represented. Transaction fact tables can be huge as the largest, containing many billions of records.

The periodic grain of a snapshot corresponds to a predefined time period, often a fiscal posting period. Figure 1 illustrates a monthly periodic account snapshot. Measured facts summarize activity during or at the end of a period of time. The periodic grain of a snapshot carries a powerful assurance that all reporting entities (such as the bank account in Figure 1) appear in every snapshot, even if there is no activity. Periodic snapshot is predictably dense and applications can rely on key combinations that are always present. Periodic snapshot fact tables can also get large. A bank with 20 million accounts and a 10-year history would have 2.4 billion monthly accounts with a periodic snapshot!

An accumulated snapshot fact table corresponds to a predictable process that has a well-defined beginning and end. Order processing, claim processing, service authorization and college admissions for typical candidates. Seed of a cumulative snapshot for an order For example, processing is typically an item in an order. The notice in Figure 1 is that there are multiple dates that represent the standard scenario that the order follows. Cumulative snapshot records are reviewed and overwritten as the process goes through its steps from start to finish. The accumulation of snapshot fact tables is usually much less than the other two types because of this overwrite strategy.

+3


source to share


1 answer


As mentioned in the comments, "Change Data Capture" is a fairly general term for "how can I handle changes in data objects over time" and there are entire books (and a gazillion posts and articles) on it.

Regardless of any statements that seem to indicate crisp black and white or always-do-it-this answer, the real answer is, as usual, "it depends" - in your case, what grain you need your table on facts.

If your data changes in unpredictable ways or very frequently, it can be difficult to implement the Kimball version of the accumulated snapshot (specify how many "milestone" date columns, etc. you may need).

So, if you prefer, you can make the fact table a transaction fact table instead of a snapshot where the fact key will be (Bid Key, Timestamp) and then into your application layer (be it view, mview, actual application, or whatever) , you can ensure that this request only receives the most recent version of each ticket (note that this can be viewed as a kind of virtual cumulative snapshot ). If you find that you don't need the previous versions (history of each claim), you can create a subroutine that cuts them out (i.e. removes or moves them somewhere else).



Alternatively, you can allow the fact (Bid) to be added when it is in its final state, but then you are likely to have a significant lag when the new (updated) rate does not hit the fact table for some time.

Either way, there are some solid and proven methods for handling this - you just need to clearly define the business requirements and design accordingly.

Good luck!

+1


source







All Articles