How good are modern jet engines?

What quality of recommendation should a new recommendation system have in order to be competitive?

By "quality of recommendation" I mean the following. Let's say the recommendation system introduced custom X elements. Then I ask him how many of them he or she really liked (could buy) and it turns out they liked him. The quality of the recommendation is Y / X (the best possible value is 1, which means that the user liked all the recommended items).

What is the quality of the recommendation

  • average and
  • it's better

the recommended systems have approximately?

Update 1: Here (page 64) the authors write that 2007 achieved the top 2 Netflix RMSE algorithms of 0.8914 and 0.8990 respectively.

The definition of RMSE can be found on page 63, but I don't understand what that means.

+3


source to share


2 answers


You are actually asking a rather interesting question. There is a lively debate in the academic community about what a) what a β€œgood” recommendation means, and b) the metrics used for forecast accuracy and other evaluative measures.

You asked:

What is the average and best recommendation quality of the recommendation systems, approximately?

The answer is it depends on a lot of different things. The short answer is that there are no real consensus "averages" or "best" for recommendation systems in general, but you can find benchmarks for specific recommendation systems β€” for example, for movie playback recommendation systems.

To help you with a little more background:

The root mean square error is used as a measure of the forecasting accuracy. That is, given a set of items (bread, milk, coffee, orange juice), how well the system can predict my grades for those items, or how well it can predict that I will buy these items.



You can use RMSE when you have a set of predicted user ratings for a set of items, and you also have your actual ratings for those items. Typically you will use RMSE in a "standalone" experiment with your real dataset. During this process, you will "hide" some of the real ratings and see if the system can predict hidden ratings. The "error" of the RMSE portion is the difference between the predicted rating and the actual rating. Each error is first squared, then the average of these errors over the set of items for that user is taken (middle part), then the square root (root part of the name) is taken. Because the RMSE first squares the error, it penalizes large errors disproportionately compared to other precision metrics.such as mean absolute error (MAE).

There is much more to making a good recommendation than making accurate predictions. This is why there is no standard / average. There are a number of different metrics that can be used for accuracy, and then accuracy is only a small part of measuring the effectiveness of a recommendation system, and in other parts there are several metrics that can be used as well! It also depends on the item you are recommending. Recommending someone to go out on a date is hardly the same as recommending what food to buy online. I have seen 0.8+ RMSE film reviewer ratings and 0.2+ RMSE assignment recommendation ratings.

I recommend reading the documents below if you want a better (non-mathematical) assessment of the complexities of evaluating recommendation systems:

Herlocker, Konstan, Terveen, Diedl - Evaluating Recommender Systems for Collaborative Filtering Filters (2004) is a good paper to start understanding the different approaches that can be used to evaluate RS performance.

Another good document is McNee, Riedl and Konstan - "Accuracy Is Not Always Good: How Accuracy Metrics Affected by Recommender Systems" (2006)

+2


source


For a good, fast and highly customizable recommendation engine, I can recommend http://www.sajari.com . It offers the ability to boost recommended results from a given dataset based on locality, popularity, data similarity, duration, etc. - significantly customize your own recommendation mechanism. A good recommendation system needs at least to be competitive, IMO, and best of all, it's not a black box and you can control the output.



+1


source







All Articles