Scikit-learn - Stochastic Gradient Descent with Custom Cost and Gradient Functions

I am implementing matrix factorization to predict the rating of a movie by a reviewer. The dataset is taken from MovieLen ( http://grouplens.org/datasets/movielens/ ). This is a well-researched recommendation issue, so I just implement this matrix factorization method for my learning goal.

I am modeling the cost function as the root mean square error between the predicted rating and the actual rating in a training set. I am using scipy.optimize.minimize function (I am using conjugate gradient descent) to affect the movie rating matrix, but this optimization tool is too slow even for the entire dataset with 100K elements. I am planning to scale my algorithms for a dataset with 20 million items.

I was looking for a Python based solution for Stochastic Gradient Descent, but the stochastic gradient descent I found in scikit-learn prevents me from using my custom cost and gradient functions.

I can implement my own stochastic gradient descent, but I am checking with you guys if there is a tool to do this already.

Basically I'm wondering if there is an API similar to this one:

optimize.minimize(my_cost_function,
                  my_input_param,
                  jac=my_gradient_function,
                  ...)

      

Thank! Un

+3


source to share


2 answers


It's so easy (at least a vanilla method) to implement that I don't think there is a "structure" around it. It's simple

my_input_param += alpha * my_gradient_function

      



Maybe you want a look at anano that will make the differentiation for you. Depending on what you want to do, this can get a little overcrowded.

+1


source


I'm trying to do something similar in R, but with a different custom cost function.

As I understand it, the key is to find the gradient and see which path takes you to the local minimum.

With linear regression ( y = mx + c

) and a least squares function, our cost function The (mx + c - y)^2

partial derivative of this with respect to m equals 2m(mX + c - y)

Which with a more traditional machine learning notation where m = theta

gives ustheta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)



I don't know for sure, but I would assume that for linear regression and cost function sqrt(mx + c - y)

that the gradient step is a partial derivative with respect to m, which I believe is m/(2*sqrt(mX + c - y))

If any / all of this is wrong, please (someone) correct me. This is what I am trying to learn myself and would appreciate it if I were leading the completely wrong direction.

+1


source







All Articles