How to use SGDRegressor in scikit-learn

Question

How to use SGDRegressor in scikit-learn

I'm trying to figure out how to properly use the SGDRegressor model learning scikit. in order to fit a dataset I need to call function fit(X,y)

where x is a numpy array of the form (n_samples, n_features) and y is an array with 1d length of length n_samples. I'm trying to figure out what y should represent .

for example my data looks like this:

enter image description here

My functions are years starting in 1972 and the values are the corresponding value for that year. I am trying to predict values for years in the future, for example 2008 or 2012. I am assuming that every row in my data should represent a row / pattern in X, where every element in that is a value for a year. in that case, what would be? I thought y should only be years old, but then y will have length n_features instead of n_samples. if y is to be n_samples long, then what could y be that is 5 in length (the number of samples in the data below). I think I must transform this data in some way.

+3

python numpy scikit-learn statistics machine-learning

Jordan bramble May 22 '15 @ 2:20 am

source to share

2 answers

farhawa · Answer 1 · 2015-05-22T13:07:43+0000

y

is your goal (what you want to predict) and you can get it like this:

from sklearn import linear_model

clf = linear_model.SGDRegressor()
clf.fit(x_to_train, y_to_train)

# clf is a trained model

y_predicted = clf.predict(X_to_predict)

IVlad · Answer 2 · 2015-05-22T07:18:11+0000

In a machine learning process, y

represents the label or purpose of your data . That is, the correct answers for your training data are ( X

).

If you want to know some values corresponding to years, then those years will be your training data ( X

), and the corresponding values will be your goals ( y

).

You may notice that this matches the dimensions given in the first paragraph: X

will have a shape (n_samples, n_features)

because it will have as many records as you have years and each record will have a size of 1 (you only have 1 function, a year) and y

will have length n_samples

because you have a value associated with each year.

How to use SGDRegressor in scikit-learn

More articles: