How to match elements to tags based on "similarity"

I have a real question.

I have a database with a schema:

element

  • id
  • Description
  • another inactive

tag

  • ID
  • name

item2tag

  • item_id
  • tag_id
  • amount

Basically, each item is labeled as up to 10 things, with different meanings. There are 50,000 items and 50,000 tags and about 500,000 entries in items2tag. I would like to find, given one element, the "most similar" element.

By "most similar" I mean the element that has the closest combination of tags ... if something is "cool" twice as much as "funny", I want to find all the other things that are almost "cool" in twice as many as they are "funny". Of course, this should apply to 10 tags, not just 2.

Any ideas?

+1


source to share


2 answers


Well, you can look at linear algebra to give each of the elements an n-dimensional vector and then calculate the distance between the elements to find the closest elements, but this is quite difficult with even small datasets.

This is why Google came up with Zoom Out Map. This will probably be your best bet, but even then it is non-trivial.



-Adam

+1


source


Given your representation of the item-tag relationship as vectors, you have a Nearest Neighbor Search instance . You can find pointers in the Collaborative filtering field .



0


source







All Articles