Comparing two arrays of cells for identical strings - MATLAB
I have a cell row matrix with lines 40,000
and one with 400
. I need to find these rows (rows) in a matrix first
that match second
. Note that there can be many repetitions.
Looks like
40,000
lines like
Anna Frank
Anna George
Jane Peter
Anna George
Jane Peter
etc.
I need to find a match here
Anna George
Jane Peter
The only way I have found is with for
and if
between two functions . But this is pretty slow:
for i=2:size(bigTable,1)
for j = 1: size(smallTable,1)
if sum(ismember(bigTable(i,1:2),smallTable(j,1:2))) == 2
Total_R(size(Total_R,1)+1,1)= i;
end
end
end
source to share
I am assuming your input is configured like this:
bigTable =
'Anna' 'Frank'
'Anna' 'George'
'Jane' 'Peter'
'Anna' 'George'
'Jane' 'Peter'
smallTable =
'Anna' 'George'
'Jane' 'Peter'
There are two approaches to solving your case.
Approach # 1
An approach ismember
-
Total_R = find(sum(ismember(bigTable,smallTable,'rows'),2)==2)
Approach # 2
%// Assign unique labels to each cell for both small and big cell arrays, so that
%// later on you would be dealing with numeric arrays only and
%// do not have to mess with cell arrays that were slowing you down
[unqbig,matches1,idx] = unique([bigTable(:) ; smallTable(:)])
big_labels = reshape(idx(1:numel(bigTable)),size(bigTable))
small_labels = reshape(idx(numel(bigTable)+1:end),size(smallTable))
%// Detect which rows from small_labels exactly match with those from big_labels
Total_R = find(ismember(big_labels,small_labels,'rows'))
Or replace this ismember
from the last line with an implementation bsxfun
-
Total_R = find(any(all(bsxfun(@eq,big_labels,permute(small_labels,[3 2 1])),2),3))
The way out of these approaches for the intended input case is
Total_R = 2 3 4 5
source to share