How do I calculate student remainders in Python?

I tried to find an answer to this problem, but I haven't found it so far. I have used statsmodel to implement a normal minimum square regression model on a mid-term dataset. I can access the list of balances in OLS results, but not student balances. How can I calculate / receive student balances? I know a formula for calculating student remainders, but I'm not really sure how to code this formula in Python.

Thanks in advance.

UPDATE: I found the answer. I can get a dataframe containing student balances from outlier_test () function from OLS results.

+3


source to share


2 answers


For simple linear regression, you can compute the stedentified residuals using the following

determine the average of X and Y as:

mean_X = sum(X) / len(X) 
mean_Y = sum(Y) / len(Y) 

      

Now you need to estimate the coefficients beta_0 and beta_1

beta1 = sum([(X[i] - mean_X)*(Y[i] - mean_Y) for i in range(len(X))]) / sum([(X[i] - mean_X)**2 for i in range(len(X))]) 
beta0 = mean_Y - beta1 * mean_X

      

Now you need to find suitable values ​​using this

y_hat = [beta0 + beta1*X[i] for i in range(len(X))]

      

Now calculate the Residuals which Y - Y_hat

residuals = [Y[i] - y_hat[i] for i in range(len(Y))]

      

We need to find the matrix H

that is enter image description here, where X

is the matrix of our independent variables.



To find the leverage , we have to take the diagonal elements of the matrix H

as follows:

leverage = numpy.diagonal(H)

      

Find standard error if regression is like

Var_e = sum([(Y[i] - y_hat[i])**2 for i in range(len(Y)) ]) / (len(Y) -2)
SE_regression = math.sqrt(Var_e*[(1-leverage[i]) for i in range len(leverage)])

      

Now You Can Calculate Studentized Residuals

studentized_residuals = [residuals[i]/SE_regression for i in range(len(residuals))] 

      

Please note that we have two types of student balances. One of them is Internally Studied Remains, and the second is Externally Studied Remains.

My solution is finding internally learned residues.

I corrected my calculations. For externally studied leftovers, see @kkawabat's answer

+1


source


Nodar's implementation is wrong, there is a corrected formula from https://newonlinecourses.science.psu.edu/stat501/node/339/ here, and also a removed checked remainder if people don't want to use the statsmodels package. Both formulas return the same result as the examples in the link above



def internally_studentized_residual(X,Y):
    X = np.array(X, dtype=float)
    Y = np.array(Y, dtype=float)
    mean_X = np.mean(X)
    mean_Y = np.mean(Y)
    n = len(X)
    diff_mean_sqr = np.dot((X - mean_X), (X - mean_X))
    beta1 = np.dot((X - mean_X), (Y - mean_Y)) / diff_mean_sqr
    beta0 = mean_Y - beta1 * mean_X
    y_hat = beta0 + beta1 * X
    residuals = Y - y_hat
    h_ii = (X - mean_X) ** 2 / diff_mean_sqr + (1 / n)
    Var_e = math.sqrt(sum((Y - y_hat) ** 2)/(n-2))
    SE_regression = Var_e*((1-h_ii) ** 0.5)
    studentized_residuals = residuals/SE_regression
    return studentized_residuals

def deleted_studentized_residual(X,Y):
    #formula from https://newonlinecourses.science.psu.edu/stat501/node/401/
    r = internally_studentized_residual(X,Y)
    n = len(r)
    return [r_i*math.sqrt((n-2-1)/(n-2-r_i**2)) for r_i in r]

      

0


source







All Articles