# Scikit-learn - Stochastic Gradient Descent with Custom Cost and Gradient Functions

I am implementing matrix factorization to predict the rating of a movie by a reviewer. The dataset is taken from MovieLen ( http://grouplens.org/datasets/movielens/ ). This is a well-researched recommendation issue, so I just implement this matrix factorization method for my learning goal.

I am modeling the cost function as the root mean square error between the predicted rating and the actual rating in a training set. I am using scipy.optimize.minimize function (I am using conjugate gradient descent) to affect the movie rating matrix, but this optimization tool is too slow even for the entire dataset with 100K elements. I am planning to scale my algorithms for a dataset with 20 million items.

I was looking for a Python based solution for Stochastic Gradient Descent, but the stochastic gradient descent I found in scikit-learn prevents me from using my custom cost and gradient functions.

I can implement my own stochastic gradient descent, but I am checking with you guys if there is a tool to do this already.

Basically I'm wondering if there is an API similar to this one:

```
optimize.minimize(my_cost_function,
my_input_param,
jac=my_gradient_function,
...)
```

Thank! Un

source to share

It's so easy (at least a vanilla method) to implement that I don't think there is a "structure" around it. It's simple

`my_input_param += alpha * my_gradient_function`

Maybe you want a look at anano that will make the differentiation for you. Depending on what you want to do, this can get a little overcrowded.

source to share

I'm trying to do something similar in R, but with a different custom cost function.

As I understand it, the key is to find the gradient and see which path takes you to the local minimum.

With linear regression ( `y = mx + c`

) and a least squares function, our cost function The
`(mx + c - y)^2`

partial derivative of this with respect to m equals
`2m(mX + c - y)`

Which with a more traditional machine learning notation where `m = theta`

gives us`theta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)`

I don't know for sure, but I would assume that for linear regression and cost function `sqrt(mx + c - y)`

that the gradient step is a partial derivative with respect to m, which I believe is
`m/(2*sqrt(mX + c - y))`

If any / all of this is wrong, please (someone) correct me. This is what I am trying to learn myself and would appreciate it if I were leading the completely wrong direction.

source to share