Why are datekey tables actually always INT?

I am looking at a column datekey

from fact tables in AdventureWorksDW

and they are all type int

.

Is there a reason for this and not the type date

?

I understand that creating a clustered index consisting of int

will optimize query speed. But let me tell you that I want to get data from last week. I can subtract 6 from the date 20170704

and I get 20170698

which is not a valid date. Therefore, I have to give everything on date

, subtract, and then do how int

.

Right now I have a foreign key constraint to make sure something other than "YYYYMMDD" is not inserted. This would not be necessary with type date

. Just now I wanted to get data between 6/28 and 7/4. I can't just subtract six from "20170703"; I need to cast from int as of today.

It seems like a lot of hassle and not many benefits.

Thank.

+3


source to share


4 answers


Yes, you can use Date datatype and have that as your primary key in Fact and dimension and you are going to save yourself a byte in the process.

And then you have to deal with a sale that's recorded and we didn't know the date. What then? In a "normal" dimensional model, you define unknown surrogate values ​​so people know there is data, and this can be useful, but it is incomplete. The general convention is to make it null or negative. Ease of use for integers.

The dates are a bit strange in that we usually use smart keys - yyyymmdd. From a debugging perspective, it's easy to quickly figure out what a date is without having to look up against your dimension.



You cannot make an invalid date. Soooo, what then? Everyone "knows" that 1899-12-31 is a "fake" date (or something like tickling your imagination) and that all is well and good until someone fills the date with their fingers and magically hits your sentry date and now you have valid unknowns mixed with just plain bad data entry.

If you are doing date calculations with a smart key, you are doing it wrong. You need to go to your data dimension to properly resolve the value and use methods that know about date logic because it is ugly and frustrating beyond simple things like month length and leap year calculations.

+2


source


Actually this fact table is related to the DimDate table and if you join this table you get a lot more options for searching by time and then if you get adding and removing days / months.

Tell me that you need a list of all orders for the second Saturday in May? Or all orders last week in December? Also, some businesses regulate their fiscal year differently. Some start in June, some start in January.



Thus, DimDate gives you flexibility when you need to do complex date searches without any calculations and using a simple index lookup in DimDate

+1


source


Good question, but the answer depends on what kind of data object you are targeting. SSAS, for example, covers tabular and multidimensional.

In the multidimensional case, you will never query the fact table through SQL, so the problem you notice, such as subtracting 6 days from 20170704, will never actually occur. Since in MD SSAS you have to use MDX for the dimension itself to implement the date logic (as pointed out in @ S4V1N's answer above). Calendar.Date.PrevMember (6). And for more complex things, you can create all types of date hierarchies and go into MDX ParallelPeriod and FirstChild and stuff like that.

For the datastore you are going to use with SQL, your question is of great relevance. I think in this case @ S4V1N's answer still applies: restrict the date logic to the dimension side

  • because where it's already implemented (perhaps with pre-built calendars and fiscal hierarchies).
  • Because your logic will run an order of magnitude fewer lines.

I'm very glad the fact tables are INT-style date based: but that's because I'm using MD SSAS. It's possible AdventureWorksDW was originally built with MD SSAS in mind (where whether or not the key used in the fact tables is applicable to SQL is irrelevant), although the MS emphasis seems to have recently switched to Tabular SSAS. Or, using INT keys for date keys could have been a "design decision" for preventing date actions on the fact tables themselves, as opposed to the Date dimension.

0


source


The thread is pretty old, but my two cents.

On one of the clients I was working on, the selected project was an int column. The reason someone, before I joined, was because there were imports from various sources - some of which included time information and some that only provided date information (both lines, to begin with).

With an int key, we could store the date / date information in the datetime column in the fact table, while still having a second date-only column (datatype: date / date-time) and use that to join the Dim table. Thus, (a) aggregations / measures will be less involved (b) we will not prematurely discard information about time that may be relevant at some point and (c) at this point, if required, the date dimension can be reorganized into include time OR a new DateTime dimension can be created.

However, this was an acceptable compromise, but it cannot be a universal recommendation.

0


source







All Articles