Is the index of whole columns faster than the index of the row of the column in the case of an index range scan?
I have a task of implementing a database on SQL Server
, where there is a table A
that contains a column - yearMonth
. I don't need date operations like calculating the number of days or months between two dates, etc. Currently. YearMonth can be defined as Date
, Int
or varchar(6)
. In terms of saving data space, 4 bytes Int
seems to be the best choice, as only a 6 digit int is required, for example. 201701
... While it varchar(6)
takes 6 bytes, it Date
takes 2x4 bytes. (applies to most databases)
But what about in terms of indexing, especially. in case of index range scan?
- If the column is
yearMonth
defined asvarchar(6)
, an index range scan can occur when using the queryselect .. from A where yearMonth IN (...)
- If the column is
yearMonth
defined asInt
orDate
, then the index range scan can occur with a type operator<=
,<=
etc.
In the above cases, which type of column definition is more efficient when scanning an index range?
source to share
Most (if not all) DBMSs store the date as an integer anyway, and for DateTime it's two integers, one for the date and one for the time, so there won't be any difference between them. I think your biggest focus will be on how you are going to use the column if you want to do any operations on the date on the column and then store it as a date (default is 1st day of month). For example, if you want to know how many months are between 201604
AND, 201701
it's easier to use a date, if you want to format your value as something like April 2017
, it's much easier if it is stored as a date.
Another consideration is validation, if you have varchar (6) or int, you need additional check constraints to ensure that any entered value is indeed a valid date, anyone can easily enter 999999
, and for a year indeed, there is no month, and for the warcharger, the possibilities are endless for the nonsense that can be introduced.
Now that you marked SQL Server, I can answer more definitively - both tags DATE
and INT
take 4 bytes of memory, so no space is saved there, and from testing both are performed almost exactly the same (the date is performed slightly, but not significantly better and often with less number of reads), so there is no benefit to using int (unless you want it to be limited to only valid dates)
I ran some quick tests using the following schema:
CREATE TABLE dbo.TDate (ID INT IDENTITY(1, 1) PRIMARY KEY, DT DATE NOT NULL);
INSERT dbo.TDate (DT)
SELECT TOP 100000 DATEADD(MONTH, RAND(CHECKSUM(NEWID())) * 300, '20000101')
FROM sys.all_objects a, sys.all_objects b;
CREATE NONCLUSTERED INDEX IX_TDate_DT ON dbo.TDate (DT);
CREATE TABLE dbo.TInt(ID INT IDENTITY(1, 1) PRIMARY KEY, DT INT NOT NULL);
INSERT dbo.TInt (DT)
SELECT (DATEPART(YEAR, DT) * 100) + DATEPART(MONTH, DT)
FROM dbo.TDate;
CREATE NONCLUSTERED INDEX IX_TInt_DT ON dbo.TInt (DT);
Then run this to compare performance
DECLARE @D1 DATE = (SELECT TOP 1 DT FROM dbo.TDate ORDER BY NEWID());
DECLARE @D2 DATE = (SELECT TOP 1 DT FROM dbo.TDate WHERE DT > @D1 ORDER BY NEWID());
DECLARE @I1 INT = (DATEPART(YEAR, @D1) * 100) + DATEPART(MONTH, @D1),
@I2 INT = (DATEPART(YEAR, @D2) * 100) + DATEPART(MONTH, @D2);
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT COUNT(*)
FROM dbo.TDate
WHERE DT >= @D1
AND DT < @D2;
SELECT COUNT(*)
FROM dbo.TInt
WHERE DT >= @I1
AND DT < @I2;
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
source to share
Int
should be faster than Varchar(6)
that as it takes up less space.
SQL SELECT speed int vs varchar
source to share
But what about in terms of indexing, especially. in case of index range scan?
Range scan robustness is not limited by the type of index, but its performance is limited by fragmentation. Your range scan request will be faster if the fragmentation is less, less fragmentation means all pages are contiguous and not scattered.
source to share