Why is such an operator so fast

You all know how a statement in sql. For example:

select * 
from customer 
where email like '%goog%'

      

So my question is, how can the database return the result so quickly? When I have to program such a function, I would loop over all clients and over every email. But this is very slow. I've heard about indices. How can a database use an index when the database doesn't know which is the first or last letter? Or is this their other way of doing it?

I don't want to program something like this. I just want to know how it works.

+3


source to share


3 answers


I have no idea what engine you are using and what is under its actual hood, but here is some useful information on this issue:



  • Often SQL Server uses free text search within a column to be able to quickly and quickly retrieve queries. This is done by creating an inverted index that maps each word to the "documents" (row, column) that contains them. One widely used Apache Lucene library . Unfortunately, most IR (Information Retrieval) libraries DO NOT support wild cards at the beginning of a query (but they do it anywhere), so your specific example cannot be found in such an index.
  • You can create an index to back up the wildcard at the start of the index using the Suffix Tree . Suffix trees are great for substring searches like your example. However, they are not very optimized for finding a row with a wild card in the middle of it.
+2


source


As I understand it, this query style is not very efficient - if there is a wildcard that affects the start of words, a full scan is required. However, if the index is indexed, the DBMS only needs to bring the entire index into memory and not check all of the contents of the table — usually this will be a relatively quick task.



0


source


Since we don't know which RDBMS you are working with, let's see how the database can benefit from an index in such a situation - and let's explore it with the book / index metaphor:

Displays that each row of data occupies a book page and each page occupies an email address. And there is an email address index at the end of the book - for each email address, it tells you which pages contain that email address. Each page in this index contains only email addresses and page numbers. Say there are 50 email addresses per page.

If you want to find all pages where an email address contains letters goog

, even though you don't know what the first or last letter of an email address is, you think it will be easier for you to: a) look through each page throughout book, or b) scan the email index at the end of the book, taking note of which pages are helpful (and then go to those pages if you need more information)?

0


source







All Articles