Based on SOM recommendations
Me and my friend decided to do a recommendation engine project in python. We originally decided to make our project using SVM, but soon found it as difficult as it is supervised learning, and now we plan to use a self-organization map and possibly link it to collaborative filtering (don't know if that's possible) to build an engine. Someone suggest a good link for self-organizing the map. Also any alternatives other than using collaborative filtering.
Many thanks.
source to share
I'm not sure if the self-organizing map is actually the best for your application. It can preserve the topological properties of your input space, but it is not well suited for a sparse dataset, which is a persistent problem in recommendation engines. I won't say that SVM is better, in fact it is probably much more than you actually want, but SOM will only be slightly better. However, if you want to know how to build a SOM, in order of utility, the following resources are worth checking out. It is also worth mentioning that the SOM is actually very close in theory to a convolutional neural network, so any resources for them should be well tolerated.
http://en.wikipedia.org/wiki/Self-organizing_map
http://ftp.it.murdoch.edu.au/units/ICT219/Papers%20for%20transfer/papers%20on%20Clustering/Clustering%20SOM.pdf
http://www.eicstes.org/EICSTES_PDF/PAPERS/The%20Self-Organizing%20Map%20%28Kohonen%29.pdf
http://www.cs.bham.ac.uk/~jxb/NN/l16.pdf
http://www.willamette.edu/~gorr/classes/cs449/Unsupervised/SOM.html
As for the approaches that will probably make more sense for your specific application, I would probably suggest a limited Boltzmann machine. The idea with RBM is that you try to create a "recommendation profile" for each user based on various statistics about them, defining a feature vector for the user. This basic prediction will happen in a way that resembles a deep neural network.
Once your network is trained in one direction, the real brilliance of RBM is that you then run it backwards. You are trying to create user profiles from recommendation profiles that work very well for such applications. For information on RBM you can visit these links:
http://deeplearning.net/tutorial/rbm.html
http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
http://www.cs.toronto.edu/~hinton/absps/netflix.pdf
Hinton is mainly an authority on these areas and is also a general BAMF in data science. The last link in the RBM list will actually be able to completely build your recommendation engine on its own, but just in case you want to use more out-of-the-box libraries or use other parts of data science, I would suggest using some dimensionality reduction before attempting collaborative filtering.
The biggest problem with collaborative filtering is that you usually have a very sparse matrix that doesn't quite give you the information you want and ends up holding a lot of things that aren't very useful to you. For this reason, there are a number of algorithms in the field of topic modeling that will allow you to obtain a lower dimension for the data, which then make collaborative filtering trivial, or can be used in any of the other approaches above to obtain more meaningful data with less intensity.
gensim is a python package that has a large amount of topic modeling done for you, and will also build tfidf vectors for use by numpy and scipy. It is also very well documented. The examples, however, are directed towards more direct NLP. Just keep in mind that the fact that their individual elements are words does not affect the underlying algorithms, and you can use it for less limited systems.
If you want to go for the gold in the theme modeling section, you should really take a look at Pachinko Allocation (PA), which is a new algorithm in topic modeling that has more promises than most other theme modelers, but doesn't come bundled with packages.
http://www.bradblock.com /Pachinko_Allocation_DAG_Structured_Mixture_Models_of_Topic_Correlations.pdf
I wish you the best of luck in your field of science! Let me know if you have any more questions and I can try to answer them.
source to share