Comparing two arrays of cells for identical strings - MATLAB

I have a cell row matrix with lines 40,000

and one with 400

. I need to find these rows (rows) in a matrix first

that match second

. Note that there can be many repetitions.

Looks like 40,000

lines like

Anna Frank  
Anna George  
Jane Peter  
Anna George  
Jane Peter    
etc.

      

I need to find a match here

Anna George  
Jane Peter  

      

The only way I have found is with for

and if

between two functions . But this is pretty slow:

for i=2:size(bigTable,1)
    for j = 1: size(smallTable,1)
        if sum(ismember(bigTable(i,1:2),smallTable(j,1:2))) == 2
            Total_R(size(Total_R,1)+1,1)= i;
        end
    end
end

      

+3


source to share


1 answer


I am assuming your input is configured like this:

bigTable = 
    'Anna'    'Frank' 
    'Anna'    'George'
    'Jane'    'Peter' 
    'Anna'    'George'
    'Jane'    'Peter' 
smallTable = 
    'Anna'    'George'
    'Jane'    'Peter' 

      

There are two approaches to solving your case.

Approach # 1

An approach

ismember

-

Total_R = find(sum(ismember(bigTable,smallTable,'rows'),2)==2)

      

Approach # 2



%// Assign unique labels to each cell for both small and big cell arrays, so that
%// later on you would be dealing with numeric arrays only and 
%// do not have to mess with cell arrays that were slowing you down
[unqbig,matches1,idx] = unique([bigTable(:) ; smallTable(:)])
big_labels = reshape(idx(1:numel(bigTable)),size(bigTable))
small_labels = reshape(idx(numel(bigTable)+1:end),size(smallTable))

%// Detect which rows from small_labels exactly match with those from big_labels
Total_R  = find(ismember(big_labels,small_labels,'rows'))

      

Or replace this ismember

from the last line with an implementation bsxfun

-

Total_R = find(any(all(bsxfun(@eq,big_labels,permute(small_labels,[3 2 1])),2),3))

      


The way out of these approaches for the intended input case is

Total_R =
     2
     3
     4
     5

      

+3


source







All Articles