How do I calculate student remainders in Python?
I tried to find an answer to this problem, but I haven't found it so far. I have used statsmodel to implement a normal minimum square regression model on a mid-term dataset. I can access the list of balances in OLS results, but not student balances. How can I calculate / receive student balances? I know a formula for calculating student remainders, but I'm not really sure how to code this formula in Python.
Thanks in advance.
UPDATE: I found the answer. I can get a dataframe containing student balances from outlier_test () function from OLS results.
source to share
For simple linear regression, you can compute the stedentified residuals using the following
determine the average of X and Y as:
mean_X = sum(X) / len(X)
mean_Y = sum(Y) / len(Y)
Now you need to estimate the coefficients beta_0 and beta_1
beta1 = sum([(X[i] - mean_X)*(Y[i] - mean_Y) for i in range(len(X))]) / sum([(X[i] - mean_X)**2 for i in range(len(X))])
beta0 = mean_Y - beta1 * mean_X
Now you need to find suitable values ββusing this
y_hat = [beta0 + beta1*X[i] for i in range(len(X))]
Now calculate the Residuals which Y - Y_hat
residuals = [Y[i] - y_hat[i] for i in range(len(Y))]
We need to find the matrix H
that is , where X
is the matrix of our independent variables.
To find the leverage , we have to take the diagonal elements of the matrix H
as follows:
leverage = numpy.diagonal(H)
Find standard error if regression is like
Var_e = sum([(Y[i] - y_hat[i])**2 for i in range(len(Y)) ]) / (len(Y) -2)
SE_regression = math.sqrt(Var_e*[(1-leverage[i]) for i in range len(leverage)])
Now You Can Calculate Studentized Residuals
studentized_residuals = [residuals[i]/SE_regression for i in range(len(residuals))]
Please note that we have two types of student balances. One of them is Internally Studied Remains, and the second is Externally Studied Remains.
My solution is finding internally learned residues.
I corrected my calculations. For externally studied leftovers, see @kkawabat's answer
source to share
Nodar's implementation is wrong, there is a corrected formula from https://newonlinecourses.science.psu.edu/stat501/node/339/ here, and also a removed checked remainder if people don't want to use the statsmodels package. Both formulas return the same result as the examples in the link above
def internally_studentized_residual(X,Y):
X = np.array(X, dtype=float)
Y = np.array(Y, dtype=float)
mean_X = np.mean(X)
mean_Y = np.mean(Y)
n = len(X)
diff_mean_sqr = np.dot((X - mean_X), (X - mean_X))
beta1 = np.dot((X - mean_X), (Y - mean_Y)) / diff_mean_sqr
beta0 = mean_Y - beta1 * mean_X
y_hat = beta0 + beta1 * X
residuals = Y - y_hat
h_ii = (X - mean_X) ** 2 / diff_mean_sqr + (1 / n)
Var_e = math.sqrt(sum((Y - y_hat) ** 2)/(n-2))
SE_regression = Var_e*((1-h_ii) ** 0.5)
studentized_residuals = residuals/SE_regression
return studentized_residuals
def deleted_studentized_residual(X,Y):
#formula from https://newonlinecourses.science.psu.edu/stat501/node/401/
r = internally_studentized_residual(X,Y)
n = len(r)
return [r_i*math.sqrt((n-2-1)/(n-2-r_i**2)) for r_i in r]
source to share