Optimizing queries based on clustered and non-clustered indexes in SQL?

Lately I've been reading about how clustered index

and non-clustered index

. My understanding in simple terms (correct me if not):

The data structure that supports clustered

and non-clustered index

isB-Tree

clustered index

: physically sorts the data based on the index column (or key). you can only have one clustered index

per table

. If not specified during table creation index

, the server SQL

will automatically create clustered index

on primary key column

.

Q1 . Since the data is physically sorted by index, there is no extra space here. It's right? so what happens when i drop the index i created?

non-clustered index

: The non-clustered indexes

tree leaf-node

tree contains the column values ​​and a pointer (row locator) to the actual row in the database. There is extra space here to store this non-clustered index table

physically on disk. However, one of them is not limited to the numbernon-clustered Indexes.

Q2 : Does this mean that a query on a non-clustered index column will not sort the data?

Q3 : There is an optional lookup to find the actual string data using a pointer on the sheet node. What's the difference in performance compared to a clustered index?

Excercise:

consider the Employee table:

CREATE TABLE Employee
(
PersonID int PRIMARY KEY,
Name varchar(255),
age int,
salary int
); 

      

Now I have created the employee table (creates a default clustered index for the employee).

The two frequent queries in this table only occur on the age and salary columns. For simplicity, lets assume the table is NOT updated frequently

eg:

select * from employee where age > XXX;

select * from employee where salary > XXXX and salary < YYYY;

      

Q4 . What is the best way to build the indexes so that queries on both of these columns have similar performance. If I have a clustered index on age queries the age column will be faster and the salary column will be slower.

Q5 . In a related post, I have repeatedly seen that indexes (both clustered and non-clustered) should be created on a column with unique constraints. Why is this? what happens if you fail to do this?

Thank you very much The posts I am reading are here:

http://javarevisited.blogspot.com/2013/08/difference-between-clustered-index-and-nonclustered-index-sql-server-database.html

http://msdn.microsoft.com/en-us/library/ms190457.aspx

Clustered vs Non-Clustered

What does clustered and non-clustered index really mean?

What are the differences between a clustered and non-clustered index?

How does database indexing work?

+3


source to share


2 answers


For SQL Server

Q1 Additional space is only required for the clustered index if it is not unique. SQL Server will add a 4 byte unique identifier to the unique clustered index. This is because it uses the cluster key as the rowid in nonclustered indexes.

Q2 A nonclustered index can be read in order. This can help with inquiries where you specify an order. It can also make attractive joins. This will also help with range queries (x <col and y> col).

Q3 SQL Server performs additional bookmark searches when using a nonclustered index. But that's only if it needs a column that is not in the index. Also note that you can include

add additional columns at the index sheet level. If an index can be used without additional search, it is called a coverage index.

If bookmark searches are required, it does not take up a large percentage of rows until the entire clustered index is scanned faster. The level depends on the string size, key size, etc. But 5% of lines is a typical blackout.

Q4 If the most important thing in your application is making both of these queries as fast as possible, you can create a coverage index for both of them:

create index IX_1 on employee (age) include (name, salary);
create index IX_2 on employee (salary) include (name, age);

      

Note that you do not need to specifically specify the cluster key as the nonclustered index has it as a row pointer.



Q5 This is more important for cluster keys than non-clustered keys because of the unique one. The real issue is whether the index is selective or not for your queries. Submit an index on a value bit

. If the distribution of the data is highly skewed, such an index is unlikely to be used for anything.


Additional information about the unique device. Imagine that you are both not a unique clustered index by age and a non-clustered salary index. Let's say you had the following lines:

age | salary | uniqifier
20  | 1000   | 1
20  | 2000   | 2

      

Then the salary index will find such lines like this

1000 -> 20, 1
2000 -> 20, 2

      

Suppose you select * from employee where salary = 1000

run your query and the optimizer decides to use a salary index. Then it will find the pair (20, 1) from the index lookup, and then it will scan that value in the master data.

+3


source


I don't know about the internals of Microsoft SQL Server, but I can answer the MySQL that you marked for your question. Details may vary for other implementations.

Q1. That's right, no additional space is required for the clustered index.

What happens if you reset the clustered index? MySQL InnoDB always uses the primary key (or the first non-null unique key) as the clustered index. If you define a table without a primary key, or you drop the primary key of an existing table, InnoDB generates an internal artificial key for the clustered index.This internal key has no logical column to refer to.

Q2. The order of the rows returned by a query that uses a nonclustered index is not guaranteed. In practice, this is the order in which the rows were accessed. If you need strings to return in a specific order, you must use ORDER BY

in your request. If the optimizer can infer that your desired order is the same as the order in which it will access the rows (index order, be it a clustered or non-clustered index), then it can skip the sort step.

Q3. An InnoDB nonclustered index does not have a pointer to the corresponding row in the index leaf, it has a primary key value. So a nonclustered index lookup is really two B-tree lookups, the first is a leaf lookup of the nonclustered index, and then the second is a clustered index lookup.



That's double the cost of a single B-tree search (more or less), which is why InnoDB has an additional feature called the Adaptive Hash Index . Frequently used values ​​get cached in AHI and the next time a query searches for a cached value it can do an O (1) lookup. In the AHI cache, it finds a pointer directly to the clustered index leaf, so it eliminates both B-tree lookups, part of the time.

How much this improves overall performance depends on how often you search for the same values ​​that were previously viewed. In my experience, this is typical for a hash lookup versus non-hash lookup relationship of roughly 1: 2.

Q4. Build indexes to serve the queries that need to be optimized. Usually the clustered index is the primary or unique key, and at least in the case of InnoDB this is required. Neither age

nor salary

can they be unique.

You may like my presentation, How to Create Indexes, Really .

Q5. InnoDB automatically creates an index when a unique constraint is declared. You cannot have a constraint without an existing index for it. If you didn't have an index, how would the mechanism ensure uniqueness when inserting a value? It will have to search the entire table to duplicate the value in that column. The index helps make unique checks much more efficient.

+5


source







All Articles