How do I compare two unsorted lists in Matlab?

I have two lists of 2D points, specified as M x 2 - and N x 2 - matrices respectively, with M and N being possibly very large.

What's the fastest way to determine how many are equal?

+3


source to share


3 answers


I'm not sure if you want to count the duplicate records, but if you could not use intersect

or some pretty intuitive sort based algorithm (see below). I would not prefer the nested loop version ...

function test_compareVecs()
    %% create some random data
    N = 31415;
    M1 = 100000;
    M2 = 200000;
    vec = rand(N,2);
    v1 = [rand(M1-N,2); vec];
    v2 = [rand(M2-N,2); vec];
    v1 = v1(randperm(M1),:);
    v2 = v2(randperm(M2),:);

    %% intersect
    disp('intersect:');
    tic
    s = size(intersect(v1,v2,'rows'),1);
    toc;
    s

    %% alternative approach
    disp('alternative approach:');
    tic;
    s = compareVecs(v1,v2);
    toc;
    s    
end

function s = compareVecs(v1,v2)
    %% create help vector
    help_vec = [[v1,zeros(size(v1,1),1)]; ...
                [v2,ones(size(v2,1),1)]]; 

    %% sort by first column
    % note: for some reason "sortrows(help_vec,1)" is slower
    hash_vec = help_vec(:,1); % dummy hash
    [~,sidx] = sort(hash_vec);
    help_vec = help_vec(sidx,:);

    %% diff + compare
    help_vec = diff(help_vec);    
    s = sum(help_vec(:,1) == 0 & ...
            help_vec(:,2) == 0 & ...
            help_vec(:,3) ~= 0);
end

      



Result

intersect:
Elapsed time is 0.145717 seconds.
s = 31415

alternative approach:
Elapsed time is 0.048084 seconds.
s = 31415

      

+2


source


Calculate all paired distances using pdist2

, and then count the pairs with zero distance. If the coordinates are floating point values, you can use tolerance instead of comparing to zero:



%// Data:
M = 10;
N = 8;
listM = randi(10,M,2)-1;
listN = randi(10,N,2)-1;
tol = 1e-6;
%// Distance matrix:
d = pdist2(listM, listN);
%// Count:
count = sum(d(:)<tol);

      

0


source


This should work regardless of the order of the points in each list or their length. This is a hash table / dictionary solution that should be fast but with memory requirements linear in the length of the lists. Please note that the syntax below may not be ideal, but a short reference to the basic data structures mentioned should make a trivial adjustment.

(1) fill the containers.Map dictionary in such a way that the key is a unique function of the dots, eg. num2str(M(i,1))'-'num2str(M(i,2))

...

(2) Then go to all the elements of the second list, create a key in the same way as in (1) and check if it exists. If so, set the map(key)=1

else parameter to 0. At the end, all common dot keys will be stored in 1, and the rest will be zeros.

(3) Complete the summation over the map values ​​(something like sum(map.values())

) which should give you the total number of unique intersections between the two sets, regardless of the order of those points displayed in each list.

OBS: If you don't want to count only unique intersections, but all duplicate points , in (2), instead of creating map(key)=1

, add 1 to map(key)

. The rest is the same.

0


source







All Articles