Scikit-learn - Stochastic Gradient Descent with Custom Cost and Gradient Functions
I am implementing matrix factorization to predict the rating of a movie by a reviewer. The dataset is taken from MovieLen ( http://grouplens.org/datasets/movielens/ ). This is a well-researched recommendation issue, so I just implement this matrix factorization method for my learning goal.
I am modeling the cost function as the root mean square error between the predicted rating and the actual rating in a training set. I am using scipy.optimize.minimize function (I am using conjugate gradient descent) to affect the movie rating matrix, but this optimization tool is too slow even for the entire dataset with 100K elements. I am planning to scale my algorithms for a dataset with 20 million items.
I was looking for a Python based solution for Stochastic Gradient Descent, but the stochastic gradient descent I found in scikit-learn prevents me from using my custom cost and gradient functions.
I can implement my own stochastic gradient descent, but I am checking with you guys if there is a tool to do this already.
Basically I'm wondering if there is an API similar to this one:
optimize.minimize(my_cost_function,
my_input_param,
jac=my_gradient_function,
...)
Thank! Un
source to share
It's so easy (at least a vanilla method) to implement that I don't think there is a "structure" around it. It's simple
my_input_param += alpha * my_gradient_function
Maybe you want a look at anano that will make the differentiation for you. Depending on what you want to do, this can get a little overcrowded.
source to share
I'm trying to do something similar in R, but with a different custom cost function.
As I understand it, the key is to find the gradient and see which path takes you to the local minimum.
With linear regression ( y = mx + c
) and a least squares function, our cost function The
(mx + c - y)^2
partial derivative of this with respect to m equals
2m(mX + c - y)
Which with a more traditional machine learning notation where m = theta
gives ustheta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)
I don't know for sure, but I would assume that for linear regression and cost function sqrt(mx + c - y)
that the gradient step is a partial derivative with respect to m, which I believe is
m/(2*sqrt(mX + c - y))
If any / all of this is wrong, please (someone) correct me. This is what I am trying to learn myself and would appreciate it if I were leading the completely wrong direction.
source to share