Using Clustered and Nonclustered Index for Big Data in SQL

Given the following SQL Server table:

  • Employee (ssn, name, department, manager, salary)

where ssn

is the primary key.

Suppose there are 30 employee records on each disk block. Each employee belongs to one of the departments. Explain why you should or should not put a nonclustered index in dept

to speed up this query in the following two cases:

SELECT ssn
FROM Employee
WHERE dept = 'IT'

      

  • when there are 50 departments
  • when there are 5000 departments

My basic understanding of clustered and non-clustered indexes in SQL Server is that clustered indexes should be used when there is a large amount of data to be returned as they will sort the table by that index first. Therefore, I believe that in the second scenario with 5000 departments, you should not put a non-clustered index on dept

in order to speed up the query.

I'm confused about the first scenario because since there are only 50 departments, does it really matter if a non-clustered or clustered index is used? The only reason I can think it might matter is if the clustered index takes extra time to sort the data first, but the non-clustered index does not.

Which clustered or non-clustered index should be used in these two cases?

+3


source to share


2 answers


Which clustered or non-clustered index should I use in these two cases?

With SSN as the clustered primary key index, the nonclustered index in the dept will span the query and be most efficient regardless of the number of rows returned. Remember that the clustered index key (primary key here) is implicitly included in the nonclustered index leaf nodes as a row locator. This avoids the need to access individual data pages containing columns that the query does not need.



The execution plan should only show the index lookup using the nonclustered dept index, touching only the data required for the query.

+1


source


The question is missing an important parameter - how many employees?

With 50 departments and 100 employees, it is cheaper to scan data rather than ditch the index and data.

If there are 10,000 employees in 50 departments, the bounce between index and data is cheaper.



The query optimizer needs to be smart enough to solve.

It also depends on whether "IT" is a big department or not.

Bottom line: provide an index and hope the optimizer doesn't mess it up.

0


source







All Articles