Is the index of whole columns faster than the index of the row of the column in the case of an index range scan?

Question

Is the index of whole columns faster than the index of the row of the column in the case of an index range scan?

I have a task of implementing a database on SQL Server

, where there is a table A

that contains a column - yearMonth

. I don't need date operations like calculating the number of days or months between two dates, etc. Currently. YearMonth can be defined as Date

, Int

or varchar(6)

. In terms of saving data space, 4 bytes Int

seems to be the best choice, as only a 6 digit int is required, for example. 201701

... While it varchar(6)

takes 6 bytes, it Date

takes 2x4 bytes. (applies to most databases)

But what about in terms of indexing, especially. in case of index range scan?

If the column is yearMonth

defined as varchar(6)

, an index range scan can occur when using the queryselect .. from A where yearMonth IN (...)
If the column is yearMonth

defined as Int

or Date

, then the index range scan can occur with a type operator <=

, <=

etc.

In the above cases, which type of column definition is more efficient when scanning an index range?

+3

sql sql-server indexing sql-tuning

Rui Apr 07 17 at 8:50

source to share

3 answers

GarethD · Answer 1 · 2017-04-07T09:21:06+0000

Most (if not all) DBMSs store the date as an integer anyway, and for DateTime it's two integers, one for the date and one for the time, so there won't be any difference between them. I think your biggest focus will be on how you are going to use the column if you want to do any operations on the date on the column and then store it as a date (default is 1st day of month). For example, if you want to know how many months are between 201604

AND, 201701

it's easier to use a date, if you want to format your value as something like April 2017

, it's much easier if it is stored as a date.

Another consideration is validation, if you have varchar (6) or int, you need additional check constraints to ensure that any entered value is indeed a valid date, anyone can easily enter 999999

, and for a year indeed, there is no month, and for the warcharger, the possibilities are endless for the nonsense that can be introduced.

Now that you marked SQL Server, I can answer more definitively - both tags DATE

and INT

take 4 bytes of memory, so no space is saved there, and from testing both are performed almost exactly the same (the date is performed slightly, but not significantly better and often with less number of reads), so there is no benefit to using int (unless you want it to be limited to only valid dates)

I ran some quick tests using the following schema:

CREATE TABLE dbo.TDate (ID INT IDENTITY(1, 1) PRIMARY KEY, DT DATE NOT NULL);
INSERT dbo.TDate (DT)
SELECT TOP 100000 DATEADD(MONTH, RAND(CHECKSUM(NEWID())) * 300, '20000101')
FROM sys.all_objects a, sys.all_objects b;

CREATE NONCLUSTERED INDEX IX_TDate_DT ON dbo.TDate (DT);

CREATE TABLE dbo.TInt(ID INT IDENTITY(1, 1) PRIMARY KEY, DT INT NOT NULL);
INSERT dbo.TInt (DT)
SELECT (DATEPART(YEAR, DT) * 100) + DATEPART(MONTH, DT)
FROM dbo.TDate;

CREATE NONCLUSTERED INDEX IX_TInt_DT ON dbo.TInt (DT);

Then run this to compare performance

DECLARE @D1 DATE = (SELECT TOP 1 DT FROM dbo.TDate ORDER BY NEWID());
DECLARE @D2 DATE = (SELECT TOP 1 DT FROM dbo.TDate WHERE DT > @D1 ORDER BY NEWID());
DECLARE @I1 INT = (DATEPART(YEAR, @D1) * 100) + DATEPART(MONTH, @D1),
        @I2 INT = (DATEPART(YEAR, @D2) * 100) + DATEPART(MONTH, @D2);


SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT  COUNT(*)
FROM    dbo.TDate
WHERE   DT >= @D1
AND     DT < @D2;

SELECT  COUNT(*)
FROM    dbo.TInt
WHERE   DT >= @I1
AND     DT < @I2;

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Biswabid · Answer 2 · 2017-04-07T08:54:33+0000

Int

should be faster than Varchar(6)

that as it takes up less space.

SQL SELECT speed int vs varchar

TheGameiswar · Answer 3 · 2017-04-07T09:05:26+0000

But what about in terms of indexing, especially. in case of index range scan?

Range scan robustness is not limited by the type of index, but its performance is limited by fragmentation. Your range scan request will be faster if the fragmentation is less, less fragmentation means all pages are contiguous and not scattered.

Is the index of whole columns faster than the index of the row of the column in the case of an index range scan?

More articles: