Is there a way to check for linearly dependent columns in a dataframe?

Is there a way to check for linear dependency for columns in a pandas frame? For example:

columns = ['A','B', 'C']
df = pd.DataFrame(columns=columns)
df.A = [0,2,3,4]
df.B = df.A*2
df.C = [8,3,5,4]
print(df)

   A  B  C
0  0  0  8
1  2  4  3
2  3  6  5
3  4  8  4

      

Is there a way to show that a column B

is a linear combination A

but C

is an independent column? My ultimate goal is to run poisson regression on the dataset, but I keep getting the error LinAlgError: Singular matrix

, i.e. it doesn't have any inverse existence in my data core and therefore it contains dependent columns.

I would like to get a programmatic way to test each function and make sure there are no dependent columns.

+3


source to share


1 answer


If you have SymPy

, you can use the "reduced form of the row echelon" via sympy.matrix.rref

:

>>> import sympy 
>>> reduced_form, inds = sympy.Matrix(df.values).rref()
>>> reduced_form
Matrix([
[1.0, 2.0,   0],
[  0,   0, 1.0],
[  0,   0,   0],
[  0,   0,   0]])

>>> inds
[0, 2]

      



Pivot columns (stored as inds

) are "column numbers" that are linearly independent, and you can simply "slice" the others:

>>> df.iloc[:, inds]
   A  C
0  0  8
1  2  3
2  3  5
3  4  4

      

+3


source







All Articles