How do I compare two unsorted lists in Matlab?

Question

How do I compare two unsorted lists in Matlab?

I have two lists of 2D points, specified as M x 2 - and N x 2 - matrices respectively, with M and N being possibly very large.

What's the fastest way to determine how many are equal?

+3

matlab

jolo 28 nov. '14 at 7:12

source to share

3 answers

Calculate all paired distances using pdist2

, and then count the pairs with zero distance. If the coordinates are floating point values, you can use tolerance instead of comparing to zero:

%// Data:
M = 10;
N = 8;
listM = randi(10,M,2)-1;
listN = randi(10,N,2)-1;
tol = 1e-6;
%// Distance matrix:
d = pdist2(listM, listN);
%// Count:
count = sum(d(:)<tol);

0

Luis mendo 28 nov. 14 at 11:36

source to share

This should work regardless of the order of the points in each list or their length. This is a hash table / dictionary solution that should be fast but with memory requirements linear in the length of the lists. Please note that the syntax below may not be ideal, but a short reference to the basic data structures mentioned should make a trivial adjustment.

(1) fill the containers.Map dictionary in such a way that the key is a unique function of the dots, eg. num2str(M(i,1))'-'num2str(M(i,2))

...

(2) Then go to all the elements of the second list, create a key in the same way as in (1) and check if it exists. If so, set the map(key)=1

else parameter to 0. At the end, all common dot keys will be stored in 1, and the rest will be zeros.

(3) Complete the summation over the map values (something like sum(map.values())

) which should give you the total number of unique intersections between the two sets, regardless of the order of those points displayed in each list.

OBS: If you don't want to count only unique intersections, but all duplicate points , in (2), instead of creating map(key)=1

, add 1 to map(key)

. The rest is the same.

0

Lord henry wotton 29 nov. 14 at 10:51

source to share

matheburg · Accepted Answer · 2014-11-28T07:25:08+0000

I'm not sure if you want to count the duplicate records, but if you could not use intersect

or some pretty intuitive sort based algorithm (see below). I would not prefer the nested loop version ...

function test_compareVecs()
    %% create some random data
    N = 31415;
    M1 = 100000;
    M2 = 200000;
    vec = rand(N,2);
    v1 = [rand(M1-N,2); vec];
    v2 = [rand(M2-N,2); vec];
    v1 = v1(randperm(M1),:);
    v2 = v2(randperm(M2),:);

    %% intersect
    disp('intersect:');
    tic
    s = size(intersect(v1,v2,'rows'),1);
    toc;
    s

    %% alternative approach
    disp('alternative approach:');
    tic;
    s = compareVecs(v1,v2);
    toc;
    s    
end

function s = compareVecs(v1,v2)
    %% create help vector
    help_vec = [[v1,zeros(size(v1,1),1)]; ...
                [v2,ones(size(v2,1),1)]]; 

    %% sort by first column
    % note: for some reason "sortrows(help_vec,1)" is slower
    hash_vec = help_vec(:,1); % dummy hash
    [~,sidx] = sort(hash_vec);
    help_vec = help_vec(sidx,:);

    %% diff + compare
    help_vec = diff(help_vec);    
    s = sum(help_vec(:,1) == 0 & ...
            help_vec(:,2) == 0 & ...
            help_vec(:,3) ~= 0);
end

Result

intersect:
Elapsed time is 0.145717 seconds.
s = 31415

alternative approach:
Elapsed time is 0.048084 seconds.
s = 31415

How do I compare two unsorted lists in Matlab?

More articles: