What's the index for orderby / where query in SQL?

I would like to run a query on a SQLite database that looks like

SELECT a,b,c,d FROM data WHERE a IN (1,2,3) ORDER BY b,c

      

What type / order of index should I use to enable SQLite (or perhaps the later MySQL) to do this quickly? How can I easily check if a query is growing using the index (i.e. how to interpret EXPLAIN)? Will SQLite be faster if I include d

in the index?

EDIT: Here are the characteristics of the table:

  • 10.000.000 lines
  • 60 different a

  • 6.000.000 excellent b

  • 2.000 different c

  • no restrictions
  • the table is my personal analytics data; it is only written once and then only read

PS: Is there a link where I can find out when SQLite / MySQL can use indexes?

+3


source to share


4 answers


If and only if IN (1,2,3)

is a list of constants (always the same values), you can use partial index like this:

CREATE INDEX so ON data (b,c) WHERE a IN (1,2,3)

      

Then executing the query gives this plan ( explain query plan select...

):

0|0|0|SCAN TABLE data USING INDEX so
0|0|0|EXECUTE LIST SUBQUERY 1

      

Note: there is no operation ORDER BY

.

As a counter test, drop the index and replace it like this:



CREATE INDEX so ON data (a,b,c);

      

New execution plan:

0|0|0|SEARCH TABLE data USING INDEX so (a=?)
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR ORDER BY

      

Can you see the sort operation now?

I have not created any meaningful test data (just an empty table) to test the execution speed. But I think you should see this right after the index is created.

Also note that partial indexes are only supported after SQLite 3.8.0 (released 2013-08-26)

+4


source


Small thing to consider: the number of rows found if you filter by a in (1, 2, 3)

? If this is a large portion of the table, which may already be as high as 15% or so, using the index may even degrade performance.

Compare this to the book index. Let's assume the index is complete, which means all words are indexed. If you are looking for and and you are using that index, you will not be ready to jump from index to your text and back again. By simply reading the book from cover to cover, scanning to and will certainly be faster.

It is not clear where the break-even point is because it depends on many factors. But it lies lower than most people think. (I've already mentioned 15%, which in my experience is a good rule of thumb)



Using an index can still be an option if the sort can be omitted. In this case, the tree index will have columns (b, c, a)

. (The hash index won't help there). Depending on the data types and refresh rate, you might even consider using it (b, c, a, d)

as an index. The DBMS will only need to perform an index scan, not a table scan. (If d

huge, it doesn't help too much and spoils a lot of space; if it d

updates very often, it might be a bad idea as well, because it doubles the workload for the update).

The physical design of a database often depends on the correct compromise.

Ok, a lot of my letter no longer applies after your editing. However, I think the answer might give you something to think about.

+1


source


The following index helps to get records quickly — assuming, of course, that dbms considers using the index faster than a full table scan. For example, if he thinks that in (1,2,3) will get 90% of the records in the table, he should dodge using the index and just scan the full table.

CREATE INDEX idx ON data(a);

      

The following index helps you sort your records quickly and quickly. Again, if dbms thinks it is wrong to use an index at all, that index will not be used. But it became more likely that the index would be used because dbms not only gets the information it writes to access, but it would have sorted them already.

CREATE INDEX idx ON data(a,b,c);

      

The following index helps to get records quickly and sort them quickly and not even have access to the table at all. This is where all the data is present in the index, so there is no reason why dbms will not use the index. That's all: the criteria for getting the required data, sorting, and even the data itself already exists.

CREATE INDEX idx ON data(a,b,c,d);

      

+1


source


  • To filter by, a in (1,2,3)

    you need an index starting with(a, ...)

  • To sort by, b, c

    you need an index starting with(b, c, ...)

No index can meet both requirements.

0


source







All Articles