How to use SGDRegressor in scikit-learn
I'm trying to figure out how to properly use the SGDRegressor model learning scikit. in order to fit a dataset I need to call function fit(X,y)
where x is a numpy array of the form (n_samples, n_features) and y is an array with 1d length of length n_samples. I'm trying to figure out what y should represent .
for example my data looks like this:
My functions are years starting in 1972 and the values are the corresponding value for that year. I am trying to predict values for years in the future, for example 2008 or 2012. I am assuming that every row in my data should represent a row / pattern in X, where every element in that is a value for a year. in that case, what would be? I thought y should only be years old, but then y will have length n_features instead of n_samples. if y is to be n_samples long, then what could y be that is 5 in length (the number of samples in the data below). I think I must transform this data in some way.
source to share
In a machine learning process, y
represents the label or purpose of your data . That is, the correct answers for your training data are ( X
).
If you want to know some values corresponding to years, then those years will be your training data ( X
), and the corresponding values will be your goals ( y
).
You may notice that this matches the dimensions given in the first paragraph: X
will have a shape (n_samples, n_features)
because it will have as many records as you have years and each record will have a size of 1 (you only have 1 function, a year) and y
will have length n_samples
because you have a value associated with each year.
source to share