How can I make MySQL Fulltext indexing ignore URL strings, especially the extension

I am indexing strings containing URLs in MySQL Full Text ... but I don't want the URLs to be included in the results.

As an example, I'm looking for "PHP" or "HTML" and I have entries like "Massage Company Ibiza Angels" (see funandfrolicks. Php ) ... a hedonistic distraction at best.

I don't see any examples of adding regex to the stopword list.

The other one I thought (and failed) is creating full-text SQL and decreasing the word contribution ... however, in the following SQL, the relevance value hasn't changed.

SELECT title, content,match(title,content) against('+PHP >".php"' IN BOOLEAN MODE)
FROM tb_feed 
WHERE match(title,content) against('PHP >".php"' IN BOOLEAN MODE) 
ORDER BY published DESC LIMIT 10;

      

An alternative is a messy SQL statement with an extra condition ...

WHERE ... IF(content REGEXP '.php', content REGEXP '(^| )php', 1) ...

      

Thoughts ... What's the best solution?

+2


source to share


2 answers


If the number of results is valid, you can choose not to display the matching words that you want to ignore. For example .php or .html. This is very fast for the kludge, but will require more memory than you need.

Another solution is to create another field with the keywords you want to find. In this field, you do not include URLs or any other keywords that are not desirable. This solution will take a little time to write, but will take up more hard disk space.



The best solution is to create another table with a keyword (or similar). When a user submits a search query to the keywords table, the search is performed on the specified keywords. The keyword table is populated by splitting the input when loading or retrieving content.

This latter option has the advantage of possibly being fast, the data is compact, since the keywords are stored once only with an index pointing to the main content record. This allows you to do smart searches if you want to.

+1


source


If you want php / html not to be part of the url, one easy way is to try

like "% php %"
like "% html %"

      



So php / html should be a word in a sentence.

0


source







All Articles