Least squares in practice

A very simple regression problem. I have three variables x1, x2, x3

with random noise. And I know the title equation y = q1*x1 + q2*x2 + q3*x3

. Now I want to find target coefficients: q1, q2, q3

estimate performance using the mean relative squared error (RSE) (Prediction/Real - 1)^2

to evaluate the performance of our forecasting methods.

In research I see that this is a common least squares problem. But I cannot get from examples on the internet how to solve this particular problem in Python. Let's say I have data:

import numpy as np

sourceData = np.random.rand(1000, 3)
koefs = np.array([1, 2, 3])
target = np.dot(sourceData, koefs)

      

(In real life, this data is noisy, with an abnormal distribution.) How do I find these koef using a least squares approach in python? Any use of lib.

+4


source to share


2 answers


@ayhan made a valuable comment.

And there is a problem with your code: there is actually no noise in the data you collect. The input data is noisy, but after the multiplication, you don't add any additional noise.

I added some noise to your measurements and used the least squares formula to fit the parameters, here's my code:

data = np.random.rand(1000,3)

true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)

noise = np.random.rand(1000) * 1

noisy_measurements = true_measurements + noise

estimated_theta = np.linalg.inv(data.T @ data) @ data.T @ noisy_measurements

      

estimated_theta

will be close to true_theta

. If you don't add noise to the measurements, they will be equal.

I used python3 matrix multiplication syntax. You can use np.dot

instead of @



This makes the code longer, so I split up the formula:

MTM_inv = np.linalg.inv(np.dot(data.T, data))
MTy = np.dot(data.T, noisy_measurements)
estimated_theta = np.dot(MTM_inv, MTy)

      

You can read on least squares here: https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#The_general_problem

UPDATE:

Or you can simply use the built-in least squares method :

np.linalg.lstsq(data, noisy_measurements)

      

+3


source


In addition to @ lhk's answer, I found the excellent scipy "Least Square Function" . It's easy to get the requested behavior with it.

Thus, we can provide a custom function that returns the residuals and generates the relative squared error instead of the absolute squared difference:



import numpy as np
from scipy.optimize import least_squares
data = np.random.rand(1000,3)

true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)

noise = np.random.rand(1000) * 1

noisy_measurements = true_measurements + noise
#noisy_measurements[-1] = data[-1]  @ (1000 * true_theta) - uncoment this outliner to see how much Relative Squared Error esimator works better then default abs diff for this case.


def my_func(params, x, y):
     res = (x @ params) / y - 1 # If we change this line to: (x @ params) - y - we will got the same result as np.linalg.lstsq
     return res

res = least_squares(my_func, x0,  args=(data, noisy_measurements) ) 
estimated_theta = res.x

      

Alternatively, we can provide a custom loss with an argument function loss

that will handle the residuals and shape the final loss.

+2


source







All Articles