How to match elements to tags based on "similarity"

Question

How to match elements to tags based on "similarity"

I have a real question.

I have a database with a schema:

element

id
Description
another inactive

tag

ID
name

item2tag

item_id
tag_id
amount

Basically, each item is labeled as up to 10 things, with different meanings. There are 50,000 items and 50,000 tags and about 500,000 entries in items2tag. I would like to find, given one element, the "most similar" element.

By "most similar" I mean the element that has the closest combination of tags ... if something is "cool" twice as much as "funny", I want to find all the other things that are almost "cool" in twice as many as they are "funny". Of course, this should apply to 10 tags, not just 2.

Any ideas?

+1

database nearest-neighbor tagging cosine

John 25 nov. '08 at 7:37

source to share

2 answers

Adam Davis · Answer 1 · 2008-11-25T07:40:56+0000

Well, you can look at linear algebra to give each of the elements an n-dimensional vector and then calculate the distance between the elements to find the closest elements, but this is quite difficult with even small datasets.

This is why Google came up with Zoom Out Map. This will probably be your best bet, but even then it is non-trivial.

-Adam

Yuval F · Answer 2 · 2008-11-25T12:02:32+0000

Given your representation of the item-tag relationship as vectors, you have a Nearest Neighbor Search instance . You can find pointers in the Collaborative filtering field .

How to match elements to tags based on "similarity"

More articles: