What causes the LIKE operator to ignore html tags, words after commas, or end in periods?

I am working on a search module that looks for text columns that contain html code. The queries are structured as follows: WHERE htmlcolumn LIKE '% searchterm%';

By default, modules search with spaces at both ends of search queries, with character characters at the beginning and / or at the end of the search, these spaces are removed (* searchterm -> LIKE '% searchterm%; I also added the ability to exclude results with certain words (-searchterm → NOT LIKE '% searchterm%') So far so good.

The problem is that words that are preceded by the html tag were not found ( <br/>

searchterm was not found when searching in LIKE% searchterm .. as well as words that appear after a comma or end with a period, etc.). What I would like to do is search for words that are not preceded or followed by the characters AZ and az. All other characters are fine.

Any ideas how I should achieve this? Thank!

+1


source to share


4 answers


Have a look at MySQLs full text search, it can use non-letter characters as delimiters. This will be much faster than searching for% term% as it requires a full table scan.



+2


source


You can use regex: http://dev.mysql.com/doc/refman/5.0/en/regexp.html



+1


source


I don't think the SQL "LIKE" statement is the right tool for the job you are trying to do. Consider using Lucene or something like that. I was able to integrate Lucene.NET into my application after a couple of days. You will spend more time than trying to save your current approach.

If you have no choice but to take your current approach, consider storing text in two columns in the database. The first column is for clear text, punctuation, etc. The second column is text, preprogrammed, just words, no punctuation, normalized to be easier for your "LIKE" approach.

+1


source


Generally speaking, it is better to use full-text search tools, but if you really want a little SQL, here it is:

SELECT * FROM `t` WHERE `htmlcolumn` REGEXP '[[:<:]]term[[:>:]]'

      

It returns all records containing the word "term", whether it is surrounded by spaces, punctuation, special characters, and so on.

+1


source







All Articles