How many fields do I need to index and how do I create them?

I have a table in a MySQL database that has the following fields:

ID | GENDER | BIRTHYEAR | POSTCODE

      

Users can search the table using any of the fields in any combination (i.e. SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000);

or SELECT * FROM table WHERE BIRTHYEAR = 1973;

)

From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns, it won't use the index unless the ID field is used. Do I need to create an index for every possible combination of fields (ID, ID / GENDER, ID / BIRTHYEAR, etc.) or create one index for all fields?

If it matters, there are over 3 million records in this table.

+2


source to share


3 answers


In this situation, I usually log the search criteria, the number of results returned, and the time it took to complete the search. Just because you create flexibility to search across any field does not mean that your users are using that flexibility. I used to create indexes on sensible combinations and then as soon as I determined that the usage patterns discard low-used indexes or create new unexpected indexes.



I'm not sure if MySQL supports statistics or histograms for garbled data, but index by gender may or may not work. If MySQL supports statistics, this will indicate the selectivity of the index. In the general population, an index on a field with a 50/50 gap will not help. If you are exemplary data, these are computer programmers, and the data is 95% male, then a female search would use the index.

+1


source


Use EXPLAIN.

(I would say use Postgres too, lol).



It looks like the latest MySQL versions can use multiple indexes in one query, they call this index merge. In this case, one index per column is sufficient.

Gender is a special case since the selectivity is 50%, you don't need an index on it, that would be counterproductive.

0


source


Creating indexes on individual fields is helpful, but it would be very helpful if your data was of type varchar and each record had a different meaning, since birthyear and postcode are numbers that are already well indexed.

You can index fertility because it should be different for many records (but up to 120 years ago in general, as far as I'm guessing).

Paul doesn't need an index in my opinion.

You can find out which combinations of fields are most likely to give different results and index them, for example: birthyear - postcode, id - birthyear, id - postcode.

0


source







All Articles