Setting up Dim and Fact tables for the data warehouse

I am tasked with creating a data warehouse for a client. The tables given don't really match the traditional examples (product / orders), so I need help getting started. The client is essentially a processing center for cases (similar to a court case). Every day, new cases are entered into the database in the "cases" table. Each column contains a bit of information that is relevant to the case. When a case is processed, additional one-to-many tables are populated with events related to the case. There are quite a few such event tables, for example table tables: (case-open, case-dept1, case-dept2, case-dept3, etc.). Each of these tables has a caseid that goes back to the "case" table. There are also several lookup tables.

Currently, the need for reports is related to identifying bottlenecks at different stages, and granularity is at the hour level for certain areas of the process.

I might be asking too much here, but I'm looking for some direction as to how I should customize the Dim and Fact tables or any other suggestions you might have.

+1


source to share


5 answers


I suggest you check out the Kimball books, especially this one , which should contain a few examples so you can think about applications in your problem area.

In any case, you need to decide if the size model is suitable. It is entirely possible to process a 3NF corporate database with different indexes or summaries or whatever.



Without seeing your current schematic, it is REALLY hard to tell. It looks like you will have several stellar models with some agreed dimensions tying them together. Thus, you can have the body size as one of your respective sizes. The facts from each other's table would actually be tables that link both the agreed dimension and any other dimensions that match the facts, for example if there is an employee ID in case of opening that would reference the corresponding employee size from a case-open-fact table ... This agreed size can be linked multiple times from multiple of your sub fact tables.

Kimball's modeling method is pretty simple and can be used like a recipe. You need to start by defining all of your facts, grouping them into fact tables, defining individual dimensions in each fact table, and then grouping them into dimension tables and defining the type of each dimension.

+2


source


The fact table is a random event and it is a "fact" because it has no numerical value. Dimensions are time, event type, event, and possibly some others, depending on what other data is in the system.

You need to concatenate the event tables into one fact table, marked as an event type dimension. Throughput / bottleneck reporting is the calculation of the differences between event times for specific combinations of event types in a given case.

The reports should time the event and possibly display them in a histogram. You can also mark certain types of event combinations and apply a shortcut to events of interest. These events could then record the time associated with them, allowing you to perform slice-and-bone operations over time using the OLAP tool.



If you want to compare certain stages in the development of the life cycle, you will have a table that will have case type, event type1, event type 2, watchdog time.

With a little massage, you can use data mining toolkit or even simple regression analysis to uncover correlations between case attributes and event events (YMMV).

+4


source


Like any other facet of development, you have to approach the problem from the end requirements ("user story", if you will) back. The most conservative approach to a warehouse is to simply present a copy of the transaction database. From there, based on the requirements, specific optimizations can be made to improve the performance of specific data access patterns. I think it is important, however, to view them as optimizations and not assume that the data warehouse should automatically be a complex explosion of every possible dimension on every fact. My experience is that, for most purposes, direct representation is adequate or even ideal for 90 %% analytic queries. For the rest, first consider indexes, indexed views, additional statistics, or other optimizations.which can be done without affecting the structures. Then, if aggregations or other redundant structures are needed to improve performance, consider dividing them into a "datafile" (at least conceptually) that provides separation between primitive facts and their dismissals. Finally, if the requirements are too fluid and the aggregation requires them to function efficiently in this way, then you might consider bulk data explosions i.e. star schema. Again, limit this to the smallest data cross-section as possible.which provides a separation between primitive facts and their dismissals. Finally, if the requirements are too fluid and the aggregation requires them to function efficiently in this way, then you might consider bulk data explosions, i.e. star schema. Again, limit this to the smallest data cross-section as possible.which provides a separation between primitive facts and their dismissals. Finally, if the requirements are too fluid and the aggregation requires them to function efficiently in this way, then you might consider bulk data explosions i.e. star schema. Again, limit this to the smallest data cross-section as possible.

+1


source


Here's what I came up with, essentially. thanks NXC

Event events

EventID TimeKey CaseID

Dim events

EventID EventDesc

Talk time

TimeKey

Dim regions

RegionID RegionDesc

Cases

CaseID RegionID

0


source


It may be a case of choosing a solution before you consider the problem. Not all datastores fit into the star schema model. I dont see you are aggregating any data here. So far, we have an actual fact table and at least one rapidly changing dimension (cases).

Looking at what I can see so far, I think the central entity in this database should have a place. Trying to insert an event in the middle doesn't seem right. Try to look at it differently. Perhaps events, events and events will begin.

0


source







All Articles