Should I just query the database or use the correct search engine solution?

I have a news site that will end up with a lot of articles. I need to implement a search function and I know that solr is one of the most popular software solutions to use today.

The site may or may not receive high traffic, but I have to implement a search function designed for a heavy traffic site.

What are the advantages of using a search engine like solr instead of just querying a database (mysql) for the content and showing it to the user? Is it just because search engine products like solr have superior performance when it comes to searching in addition to (according to what I've read) more flexibility when searching? I am not looking for answers like "use solr", they need to be explained why not to use the database.

+3


source to share


1 answer


They solve different problems. Search applications have different underlying functionality than traditional databases (both SQL and NoSQL) because the requirements are different and their usage is different.

Currently, there are several overlaps between the search capabilities of DB, but if we use standard database interactions as a starting point, the entry "find articles with these three words is present" is a task you will have to do manually. Add in all the other things you would normally want to do in order to perform a search and provide relevant results to your users, and you have a very different problem that regular databases are trying to solve.

Several functions that are focused on finding services:

Timing and weight of fields . If you have a match in "title" it should be weighted more heavily than "text". But you can also have an "old age" factor affecting the score, so depending on the use case, all of these weights between fields and functions can be tuned to solve just about any problem you have.

Normalization and text processing . You might want to expand on synonyms when indexing. Searching for ipod and i-pod should probably give the same result. Windows and windows. These operations are fundamental to most search engines. You might want the field to do phonetic matches (pronunciation of words, not writing), and you could rate this differently than an exact match. Solr's list of parsers, tokens and filters can give you an idea of ​​some of the available text processing functions.



Faceting / Navigators . How many of the documents in my search have different values ​​in the xyz field and what are their meanings? You've probably seen this feature on many sites, such as "filter by file type", "only show hits from the last 7 days, last 31 days, last 365 days", etc. Together with the number of documents for each bin.

Selection . Which piece of text has been matched and the correct snippet is retrieved, which I can return to the end user for display. You see this feature every time you do a Google search, and the text below the hit shows the actual content from the web page where your query was found.

.. and these are just a few of the features that people who work with search look at every day. I'm not saying that they are not resolved by more traditional DB functionality, but they require you to implement code, sync content, and generally write a lot of code to get what you get for free with technology already made to solve the problem.

Performance depends on many factors, but it is likely to be better than OK. You can scale most solutions horizontally, so you can add servers as needed. But you probably won't have to do this for a while, so don't worry about it. Premature optimization, etc.

+4


source







All Articles