Is there a way to check for linearly dependent columns in a dataframe?
Is there a way to check for linear dependency for columns in a pandas frame? For example:
columns = ['A','B', 'C']
df = pd.DataFrame(columns=columns)
df.A = [0,2,3,4]
df.B = df.A*2
df.C = [8,3,5,4]
print(df)
A B C
0 0 0 8
1 2 4 3
2 3 6 5
3 4 8 4
Is there a way to show that a column B
is a linear combination A
but C
is an independent column? My ultimate goal is to run poisson regression on the dataset, but I keep getting the error LinAlgError: Singular matrix
, i.e. it doesn't have any inverse existence in my data core and therefore it contains dependent columns.
I would like to get a programmatic way to test each function and make sure there are no dependent columns.
source to share
If you have SymPy
, you can use the "reduced form of the row echelon" via sympy.matrix.rref
:
>>> import sympy
>>> reduced_form, inds = sympy.Matrix(df.values).rref()
>>> reduced_form
Matrix([
[1.0, 2.0, 0],
[ 0, 0, 1.0],
[ 0, 0, 0],
[ 0, 0, 0]])
>>> inds
[0, 2]
Pivot columns (stored as inds
) are "column numbers" that are linearly independent, and you can simply "slice" the others:
>>> df.iloc[:, inds]
A C
0 0 8
1 2 3
2 3 5
3 4 4
source to share