What is the most efficient way to find the largest linear independent subset of expressions

I have a list of expressions

from sympy import *

x = symbols('x')

e0 = x
e1 = x**2
e2 = 2*x**2

      

how to find the largest subset of linear independent expressions? You might assume that the expressions should be ordered, i.e. The lower subscript expression is preferred.

I tried repeating the following:

a = numbered_symbols('a')
a0 = next(a)
a1 = next(a)
a2 = next(a)

solve(a0*e0 + a1*e1, a0, a1)
# {a0: 0, a1: 0}

solve(a0*e0 + a1*e1 + a2*e2, a0, a1, a2)
# {a1: -2*a2, a0: 0}

      

So, I accept e0 and e1. To automate this:

from operator import mul
from toolz import take

def _linear_independent(exprs):
    c = list(take(len(exprs), numbered_symbols("c")))
    expr = sum(map(mul, exprs, c))
    res = solve(expr, c)
    return all(v == 0 for v in res.values())

def max_independent_set(exprs):
    max_set = [exprs[0]]
    for e in exprs[1:]:
        if _linear_independent(max_set + [e]):
            max_set.append(e)
    return max_set

max_independent_set([e0, e1, e2]) # [x, x**2]

      

Is there a more efficient (runtime) way to do this? Currently I need to call solution N-1 and the system for the solution is increasing. Maybe you can break this down into smaller tasks?

Bonus: I'm also looking for a way to do this with multiple independent variables. My current approach doesn't work (it solves more than just odds):

x, y = symbols('x y')
e0 = x
e1 = y
exprs = [e0, e1]

c = list(take(len(exprs), numbered_symbols("c")))
expr = sum(map(mul, exprs, c))
res = solve(expr, c)  # [{c0: -c1*y/x}]

      

My expressions describe functions from R ^ N → R. I used to evaluate them on my dataset and exclude based on correlation.

+3


source to share


2 answers


Based on the assessor's answer, I came up with a solution.

Using nullspace works great, but you still need to find a list of independent / atomic expressions:

import itertools as it
import operator
from functools import reduce
import sympy

def get_atomic(expr, normalize=True):
    if isinstance(expr, sympy.Mul):
        if isinstance(expr.args[0], sympy.Number):
            args_ = expr.args[1:]
        else:
            args_ = expr.args
        return reduce(operator.mul, args_)
    elif isinstance(expr, (sympy.Float, sympy.Integer)):
        return 1 if normalize else expr
    else:
        return expr

all_summands = ((get_atomic(s) for s in sympy.Add.make_args(e)) for e in exprs)
atomic = list(set(it.chain.from_iterable(all_summands)))
A, b = sympy.linear_eq_to_matrix(exprs, atomic)

redundant_indices = [np.where(sympy.matrix2numpy(nullvector) == 1) [0][-1]
                     for nullvector in A.T.nullspace()]

exprs_ = [e for i, e in enumerate(exprs) if i not in redundant_indices]

      



This works well for my testboxes:

exprs = [x, x**2, x**3, x**3 - x**2]
# exprs_ = [x, x**2, x**3]
exprs = [cos(x), cos(x)*sin(y), sin(y)]
# exprs_ = [cos(x), cos(x)*sin(y), sin(y)]

      

0


source


You can use some matrix routines to calculate this. The function linear_eq_to_matrix

converts the system of equations into a matrix:

>>> A, b = linear_eq_to_matrix([x, x**2, 2*x**2], [x, x**2])
>>> pprint(A)1  0
⎢    ⎥0  1
⎢    ⎥0  2
      

(if you have constant factors, they will be placed in b

the right side of the equation). This is a transposition of what you want, since the matrix operations you want to use on the columns. A.T.columnspace

will return columns that span the columns A.T

:

>>> A, b = linear_eq_to_matrix([x, x**2, 2*x**2], [x, x**2])
>>> pprint(A.T.columnspace())
⎡⎡1⎤  ⎡0⎤⎤
⎢⎢ ⎥, ⎢ ⎥⎥
⎣⎣0⎦  ⎣1⎦⎦

      

This indicates that the first and second items are taking up space (since you got the first and second columns A.T

). If you also want to know how to rewrite other elements in terms of these linearly independent elements, use A.T.nullspace()

.

For example:

>>> pprint(A.T.nullspace())
⎡⎡0 ⎤⎤
⎢⎢  ⎥⎥
⎢⎢-2⎥⎥
⎢⎢  ⎥⎥
⎣⎣1 ⎦⎦

      



This means that -2*(x**2) + 1*(2*x**2) = 0

(so the last two elements are linearly independent.

To make a bigger example:

>>> A, b = linear_eq_to_matrix([x, 2*x, x**2, 2*x**2, x**3, x + x**2], [x, x**2, x**3])
>>> pprint(A.T)1  2  0  0  0  1
⎢                ⎥0  0  1  2  0  1
⎢                ⎥0  0  0  0  1  0>>> pprint(A.T.columnspace())
⎡⎡1⎤  ⎡0⎤  ⎡0⎤⎤
⎢⎢ ⎥  ⎢ ⎥  ⎢ ⎥⎥
⎢⎢0⎥, ⎢1⎥, ⎢0⎥⎥
⎢⎢ ⎥  ⎢ ⎥  ⎢ ⎥⎥
⎣⎣0⎦  ⎣0⎦  ⎣1⎦⎦
>>> pprint(A.T.nullspace())
⎡⎡-2⎤  ⎡0 ⎤  ⎡-1⎤⎤
⎢⎢  ⎥  ⎢  ⎥  ⎢  ⎥⎥
⎢⎢1 ⎥  ⎢0 ⎥  ⎢0 ⎥⎥
⎢⎢  ⎥  ⎢  ⎥  ⎢  ⎥⎥
⎢⎢0 ⎥  ⎢-2⎥  ⎢-1⎥⎥
⎢⎢  ⎥, ⎢  ⎥, ⎢  ⎥⎥
⎢⎢0 ⎥  ⎢1 ⎥  ⎢0 ⎥⎥
⎢⎢  ⎥  ⎢  ⎥  ⎢  ⎥⎥
⎢⎢0 ⎥  ⎢0 ⎥  ⎢0 ⎥⎥
⎢⎢  ⎥  ⎢  ⎥  ⎢  ⎥⎥
⎣⎣0 ⎦  ⎣0 ⎦  ⎣1 ⎦⎦

      

Note that we have 3 enclosing zero space vectors and 3 enclosing column space vectors, which correspond to the rank-nullity theorem (3 + 3 = 6). For the columnar space, we got the first, second and fifth columns A.T

, which means that it is linearly independent elements (alternately we can simply multiply the columns in our vector of terms that we extracted from the matrix Matrix([x, x**2, x**3]).T

.

The final zero space 1

in each column is an element which can be removed, and the terms above it (in fact their denial) will tell you how to rewrite it in other terms (e.g., 2*x = -(-2)*x

, 2*x**2 = -(2)*x**2

, x + x**2 = -(1)*x + -(1)*x**2

).

This requires you to start with a list of expressions that you treat as terms (in this example [x, x**2, x**3]

). It can make a difference. To take an example from the comments on your question, [cos(x), cos(x)*sin(y)]

is linearly dependent if your terms are simple [cos(x)]

, but not even linear if your members [cos(x), sin(y)]

(and linearly independent linear if they are [cos(x), cos(x)*sin(y)]

.

+2


source







All Articles