Fast weighted scatter matrix calculation

Question

Fast weighted scatter matrix calculation

In this question six months ago , jez was good enough to help me come up with a quick approximation for the outer product of the line differences, i.e .:

K = np.zeros((len(X), len(X)))
for i, Xi in enumerate(X):
  for j, Xj in enumerate(X):
    dij = Xi - Xj
    K += np.outer(dij, dij)

This helped find the computation of the scattering matrix for the Fisher discriminant analysis form. But now I'm trying to do a local discriminatory Fisher analysis where each external product is weighted by a matrix A that has pair location information, so a new line is:

K += A[i][j] * np.outer(dij, dij)

Unfortunately, the quick way to calculate the unweighted scatter matrix provided in the previous answer doesn't work for this, and as far as I can tell, it's not easy to make a quick change.

Linear algebra is definitely not my strong suit, I am not very good at such things. What is a quick way to calculate the weighted sum of the pairwise outer products of the row difference?

+3

python numpy matrix linear-algebra

Andrew Latham June 30. 15 at 18:38

source to share

1 answer

Will martin · Accepted Answer · 2015-07-01T01:41:27+0000

Here is a way to vectorize the calculation you specified. If you do that much, it might be worth learning how to use "numpy.tensordot". It multiplies all the elements according to the standard numpy translation and then sums over the axes given by kwrd, "axes".

Here is the code:

# Imports
import numpy as np
from numpy.random import random

# Original calculation for testing purposes 
def ftrue(A, X):
  ""
  K = np.zeros((len(X), len(X)))
  KA_true = np.zeros((len(X), len(X)))

  for i, Xi in enumerate(X):
    for j, Xj in enumerate(X):
      dij = Xi - Xj
      K += np.outer(dij, dij)
      KA_true += A[i, j] * np.outer(dij, dij) 
  return ftrue

# Better: No Python loops. But, makes a large temporary array.
def fbetter(A, X):
  ""
  c = X[:, None, :] - X[None, :, :]
  b = A[:, :, None] * c           # ! BAD ! temporary array size N**3
  KA_better = np.tensordot(b, c, axes = [(0,1),(0,1)])
  return KA_better

# Best way: No Python for loops. No large temporary arrays
def fbest(A, X):
  ""
  KA_best = np.tensordot(A.sum(1)[:,None] * X, X, axes=[(0,), (0,)])
  KA_best += np.tensordot(A.sum(0)[:,None] * X, X, axes=[(0,), (0,)])
  KA_best -= np.tensordot(np.dot(A, X), X, axes=[(0,), (0,)])
  KA_best -= np.tensordot(X, np.dot(A, X), axes=[(0,), (0,)])
  return KA_best


# Test script
if __name__ == "__main__":

  # Parameters for the computation 
  N = 250
  X = random((N, N))
  A = random((N, N))

  # Print the error
  KA_better = fbetter(A, X)
  KA_best = fbest(A, X)

  # Test against true if array size isn't too big
  if N<100:
    KA_true = ftrue(A, X)
    err = abs(KA_better - KA_true).mean()
    msg = "Mean absolute difference (better): {}."
    print(msg.format(err))

  # Test best against better
  err = abs(KA_best - KA_better).mean()
  msg = "Mean absolute difference (best): {}."
  print(msg.format(err))

My first attempt (fbetter) made a large temporary array of size NxNxN. The second try (fbest) never does anything more than NxN. This works pretty well up to N ~ 1000.

A timing test

Also, the code runs faster when the output array is smaller. enter image description here

I have MKL installed, so the calls to tensordot are really fast and running in parallel.

Thanks for the question. This was a great exercise and reminded me how important it is to avoid creating large temporary arrays.

Fast weighted scatter matrix calculation

More articles: