When to use an index

Is there a general rule or formula as to when it would be useful to use an index?

For example, in the following case, it would be obvious to add INDEX to the column id

:

SELECT * FROM table WHERE id = '1iuhiasdf89384h'

      

However is_qualified

, the following might not be helpful for, if it has 2 possible values:

SELECT * FROM table WHERE is_qualified=1

      

How about an element that has 10 [0-9] possible values? 100 values ​​[0-99]? When do you usually want to add an index and how does this relate to cardinality

?

+3


source to share


3 answers


Typically, the reason the optimizer prefers to use an index in a clause where

is to reduce the number of data pages read. Thus, if your data is randomly distributed across pages, the important question is how many records are on each page and how many records are selected by the filter.

Let's say there are 100 records on each page. Then a random selection of one percent of the records will probably select (almost) all pages. In this situation, reading pages and filtering on pages is likely to be faster than using an index, because all data pages will be read anyway. T

So, for most tables, a query that returns one or more records will do better with an index. Queries returning many records may not use the index. The consequence is that for small tables, indexes can never be useful. If the data is valid for a single page, crawling the page with a filter where

can be as fast as using an index.



That is, if the selectivity of the query is greater than the inverse of the average number of records per data page, then the index is probably not useful. This comes as close to the "general" rule as possible, but read on.

The type of index also matters. If you have a condition is_qualified

and only 0.1% of records match that condition, then the index can be useful. Or, if 1% are qualified, but the records are very large, so there are only 10 on the page, then the index is probably useful. Or, if is_qualified

is the first column in a clustered index, then all c values 1

are on multiple pages. With a clustered index, even 30% selectivity for is_qualified = 1

would mean reading only 30% of the data pages, which should reduce the time for many queries by two-thirds.

Of course, this precludes the use of indexes for joins and order of action - situations where even 100% selectivity can still use the index. However, your question seems to have focused on filtering in a sentence where

.

+2


source


I think you need to do some research and read about using indexes. Even from your own example, you expect an index on the "id" column, because you are looking for a specific ONE ... But then it doesn't matter to have an index on IS_QUALIFIED, because it can only have 2 possible values ​​... But an alphanumeric id can have billions of values.

Indexes are used to help you quickly narrow down and find records without going through raw data pages to pull a qualified record based on the general criteria you expect to output. Be aware that indexes even have multiple columns based on the types of regular queries that you expect to run.

Let's take some columns of data scripts and suppose the table is a child table by "id", has "is_qualified" and "othertype" (your values ​​are 0-9), and some other things. Perhaps such as the date of something, or a description of "another type".



if you only had an index on the ID then all the "ID" records would be grouped together, and that's ok, once you get to them, a quick run through them will give you results.

But now let's say that you are looking for all ids that have "Is_Qualified" = 1 and the column "othertype" is 3. You cannot get them quickly, there is no correlation from one to the other ... but if you have an index with multiple keys (is_qualified, othertype, id), you can quickly jump to just that data Is_Qualified = 1 and othertype = 3, and then you have all the ids ready to go.

Apply this to a table that has several million records to query, and run queries without indexes, and you will appreciate their purpose, and you will learn how to design and conceptualize indexes well.

+1


source


It is more important how many rows exist in the table, rather than whether they have 2 or 10 possible values. For a table with less than 200-300 rows, there is no need for an index. If you have 1 million rows, it is useful to have an index even if the column only has 0 and 1 - you will avoid a full scan. For example, if only 500 out of 1M rows are equal to 1, the query will skip scanning 999,500 rows.

0


source







All Articles