An efficient way to compute a tensor
Suppose it c
is a d-dimensional vector. I want to calculate the next third order tensor
where e_i
denotes the i
th standard basis of the Euclidean space. Is there an efficient way to calculate this? I am using the following for-loop and Kruskal tensor ktensor
to calculate it using the tensor toolbar operated by Sandia National Labs:
x=ktensor({c,c,c});
I=eye(d);
for i=1:d
x=x+2*c(i)*ktensor({I(:,i),I(:,i),I(:,i)}
end
for i=1:d
for j=1:d
x=x- c(i)*c(j)*(ktensor({I(:,i),I(:,i),I(:,j)})+ktensor({I(:,i),I(:,j),I(:,i)})+ktensor({I(:,i),I(:,j),I(:,j)}))
end
end
source to share
Here's an opportunity.
- I used the optimization for the second term, since it places values
c
along the "diagonal" of the tensor. - There is not much room for optimization for the first term as it is dense multiplication, so it
bsxfun
seems appropriate. - For the third term, I stick with it
bsxfun
, but since the result is somewhat sparse, you can use it manually if your matrix is ββlarge.
Here is the code:
dim = 10;
c = [1:dim]';
e = eye(dim);
x = zeros([dim, dim, dim]);
% initialize with second term
x(1:dim*(dim+1)+1:end) = 2 * c;
% add first term
x = x + bsxfun(@times, bsxfun(@times, c, shiftdim(c, -1)), shiftdim(c, -2));
% add third term
x = x - sum(sum(bsxfun(@times, shiftdim(c*c',-3), ...
bsxfun(@times, bsxfun(@times, permute(e, [1, 3, 4, 2, 5]), permute(e, [3, 1, 4, 2, 5])), permute(e, [3, 4, 1, 5, 2])) +...
bsxfun(@times, bsxfun(@times, permute(e, [1, 3, 4, 2, 5]), permute(e, [3, 1, 4, 5, 2])), permute(e, [3, 4, 1, 2, 5])) +...
bsxfun(@times, bsxfun(@times, permute(e, [1, 3, 4, 5, 2]), permute(e, [3, 1, 4, 2, 5])), permute(e, [3, 4, 1, 2, 5]))), 5), 4);
EDIT
More efficient (e.g. from memory) computation of the third term:
ec = bsxfun(@times, e, c);
x = x - ...
bsxfun(@times, ec, shiftdim(c, -2)) -...
bsxfun(@times, c', reshape(ec, [dim, 1, dim])) -....
bsxfun(@times, c, reshape(ec, [1, dim, dim]));
source to share
You can try Parallel Computing Toolbox viz parfor
.
x=ktensor({c,c,c}); I=eye(d); y = zeros(d,d,d, d); parfor i=1:d y(:,:,:, i) = 2*c(i)*ktensor({I(:,i),I(:,i),I(:,i)}; end x = x + sum(y, 4); z = zeros(d,d,d, d,d); parfor i=1:d for j=1:d % only one layer of parallelization is allowed z(:,:,:, i,j) = c(i)*c(j)*(ktensor({I(:,i),I(:,i),I(:,j)})+ktensor({I(:,i),I(:,j),I(:,i)})+ktensor({I(:,i),I(:,j),I(:,j)})); end end x = x - sum(sum(z, 5), 4); x % is your result
It runs untouched commands ktensor
, but on separate threads, so Toolbox takes care of running code in parallel.
Due to the property of independence of each iteration, which means, for example, that it c_{i+1, j+1}
does not rely on c_{i, j}
, this is possible.
Depending on the number of cores (and hyperthreads) of your system, there may be up to # -o-cores acceleration time.
source to share