How can I incrementally process an OLAP cube?

I have one multidimensional OLAP cube for sales. But I have a huge database. At first, I completely processed the olap cube. But every time he will do everything, so it will take a long time. I need to do incremental processing. But I don't have any action on this matter. Can you help me?

How should I follow the path? I found several articles dedicated to this subject, such as this one.

But I don't know what I will write, where is the condition in the section request.

enter image description here

+3


source to share


1 answer


Date is the typical way to split fact tables. You have a Sales_TransactionDate column in your source, so this would be the obvious choice as a split attribute.

Depending on the amount of data and therefore the number of partitions you want to create, you can divide by year, month, day, or something in between.

The idea is that you have processed the entire cube only once. Then, every (for example) night, you only re-process the section for the current (for example) month. This only works if it is true that older data (i.e. data until the end of the last (for example) month) never changes in the original system. If it has changed, you will skip the changes because the last month's section is no longer processed.

So this is an important task for additional processing. You should know how long after the first appearance the data in the source system can ever be changed (obviously, only changes that are relevant to the question of the cube - if some column that the cube does not use changes, it does not matter ), and at what stage it settles into an unchanged state.



This is an ETL question related to how (if at all) you use the slowly changing Type2 attributes, and whether the original system has any indication of when the row was updated (e.g. the LastUpdated datetime column).

(Edit - as per comment below)

You need to adjust the size of the partition to ensure that you can commit all possible changes by simply processing the last partition. For example, if a row can change up to 6 months after the transaction date (or whatever date you use to split it), you should process the last 6 months of the data so you don't miss any changes.

But this limitation only affects the size of the most recent partition - older partitions can be sorted as you wish. You can reduce the amount of processing in the last section if there is a mechanism in the source system to mark the rows as "changed". (One example is the "LastUpdated" column, which is always set to the current date / time when the row is updated, and the other is SQL CDC).

+2


source







All Articles