Collaborative Amazon filtering separately

I am trying to fully understand Amazon's item-by-item algorithm in order to apply it to my system to recommend items that the user might like, matching the previous items that the user liked.

So far I've read: Amazon paper , item-sub-presentation and rudimentary algorithms . Also I found this question , but after that I just got more confused.

I can say that I need to follow these steps to get a list of recommended items:

  • I have my data set with elements that users like (I have liked = 1 and disliked = 0).
  • Use Pearson's correlation coefficient (How is this done? I found the formula, but is there any example?).
  • Then what should I do?

So I came up with these questions:

  • What is the difference between filtering item-object and item? Are both algorithms the same?
  • Is it correct to replace a ranked score with a loved one or not?
  • Is it correct to use the item-to-item algorithm, or are there others more suitable for my case?

Any information on this topic would be appreciated.

+3


source to share


1 answer


Big questions.

Think about your data. You can have unary (consumed or zero), binary (favorite and disliked), triple (favorite, disliked, unknown / zero), or continuous (zero and some numerical scale), or even ordinal (zero and some ordinal scale). Different algorithms work better with different data types.

Collaborative filtering of product items (also called elemental) works best with numeric or ordinal scales. If you only have unary, binary, or ternary data, you may be better off with data mining algorithms such as smart association management.

Given a matrix of users and their item ratings, you can calculate the similarity of each item to every other item. Matrix manipulation and computation is built into many libraries: for example, try scipy and numpy in Python. You can just iterate over the elements and use the built-in matrix calculations to do most of the work https://en.wikipedia.org/wiki/Cosine_similarity . Or download a framework like Mahout or Lenskit to do it for you.



Now that you have a matrix of each item, the similarity to every other item, you can suggest items to user U. So take a look at her item history. For each story item i, for every item in your dataset ID, add the affinity i to ID in the candidate position list. Once you've gone through all the elements of the story, sort the candidate list in descending order and recommend the top ones.

To answer the remaining questions: A continuous or ordinal scale will give you the best collaborative filtering results. Don't use a "liked" or "inconsistent" scale if you have better data.

Matrix factorization algorithms work well, and if you have few users and don't have a lot of updates for your rating matrix, you can also use collaborative filtering of users and users. Try the item element first: it's a good general-purpose recommendation algorithm.

+3


source







All Articles