What size int is used for foreign key fields?

I have a star schema type database with fact tables that have many foreign keys for dimension tables. The number of entries in each dimension table is small — often less than 256 bytes, but always less than 64 KB. Fact tables usually contain hundreds of thousands of records, so I want to increase my connection speed.

I would like to use tinyints and smallints, but a colleague says that I am crazy to worry about this and just use 4 bytes of ints in each case. Who is right?

+1


source to share


5 answers


Yr employee is wrong. If four byte integers are used for foreign keys, then the primary keys in the fact table must also be 4 byte integers. And then you make the fact table wider than it needs to be, reducing the number of records that can fit on a single index page. To the extent that this changes the width of the primary key index, it will negatively impact index performance. If your primary key could be two tinyInts and 3 smallints, and you changed to five 4 byte ints, you changed the index width from 8 bytes wide to 20 bytes. Your index will have less than half of the I / O page write, and it will take twice as many logical and / or physical reads.



NOTE. As Jim McLade says, SQL Server 2008 (Enterprise or Developer edition) includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriate size for each row.

+2


source


Go with 4 byte inputs and do the optimization elsewhere. Any effort you put into here will not help you make enough profit compared to the ease of coding, ease of use, and ease of maintenance offered by the simple scheme.



+4


source


On a 32-bit server, smaller ints are not going to save you anything in processor performance, even less on a 64-bit server. You may get disk savings and therefore some disk improvement, but overall the overall improvement can be sloppy.

+2


source


As always with performance issues, it depends. If your fact lines are tiny, say 20 bytes, then storing two bytes per line will save 400 bytes and allow you to fit up to 20 lines on each page. If your fact strings are larger, say 500 bytes, then you can only store 32 bytes, which doesn't matter.

The advantage of using INT over SMALLINT is that you don't have to worry about what happens if you suddenly get more rows than you expected.

SQL Server 2008 includes row-level compression, which means that you can declare the value as a 4-byte INT, but it will store the value in the best-fit type for each row.

0


source


4-byte integers for primary keys for most solutions.

If you need flexibility on where you can create a PK value and do some data replication later on, you might need to consider using uniqueidentifiers. The manual is easily created in a database, within a stored procedure, within a DAL, or elsewhere, and is guaranteed to be unique.

Sometimes this alone can give your solution additional performance without requiring a database lookup to get a new record ID. (i.e. create it in the DAL and save it right away, instead of using something like scope_identity () or @@ Identity)

Hope it helps.

0


source







All Articles