How to optimize your database for superstring queries?

So, I have a database table in MySQL that has a column that contains a row. Given the target string, I want to find all rows that have a substring contained in the target, that is, all rows for which the target string is a superstring for the column. At the moment I'm using a line by line query:

SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')

      

My concern is that this will not scale. I am currently doing some tests to see if this is a problem, but I am wondering if anyone has any suggestions for an alternative approach. I briefly covered MySQL full-text indexing, but it also seems to be designed to find a substring in data, instead of finding out if the data exists in a given string.

+2


source to share


4 answers


Well, it looks like the answer is that you don't. This type of indexing is generally not available, and if you want it in your MySQL database, you will need to create your own MySQL extensions. The alternative I am pursuing is to do indexing in my application.



Thanks to all who responded!

0


source


You can create a temporary table with a full text index and insert "my superstring" into it. Then you can use MySQL full-text match syntax in your join query to your persistent table. You will still be doing a full table scan on your persistent table because you will be checking against every single row (which is what you want, right?). But at least "my superstring" will be indexed, so it will likely perform better than yours now.



Alternatively, you can simply select column

from table

and match in high level language. Depending on how many rows are in table

, this approach might make more sense. Offloading heavy tasks to the client server (web server) can often be a win as it reduces the load on the database server.

+1


source


If your superstrings are URLs and you want to find substrings in them, it would be helpful to know if your substrings can be dotted.

For example, you have superstrings:

www.mafia.gov.ru www.mymafia.gov.ru www.lobbies.whitehouse.gov

If your rules contain "mafia" and you want the first 2 to match, then what I say does not apply.

Otherwise, you can parse your URLs into things like: ['www', 'mafia', 'gov', 'ru'] Then it will be much easier to search for every item in your table.

0


source


I created a search solution using views that needed to be robust enough to grow with customer needs. For example:


CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)

CREATE VIEW viewMySearchData 
as
SELECT 
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData

SELECT 
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'


      

If they then decide to add columns to tblMyData and they want those columns to be searched for, then modify viewMysearchData to add new columns to the "AS SearchData" section.

If they think there are two columns in the search, just change viewMySearchData to remove the unneeded columns from the "AS SearchData" section.

0


source







All Articles