How do I compare two unsorted lists in Matlab?
I'm not sure if you want to count the duplicate records, but if you could not use intersect
or some pretty intuitive sort based algorithm (see below). I would not prefer the nested loop version ...
function test_compareVecs()
%% create some random data
N = 31415;
M1 = 100000;
M2 = 200000;
vec = rand(N,2);
v1 = [rand(M1-N,2); vec];
v2 = [rand(M2-N,2); vec];
v1 = v1(randperm(M1),:);
v2 = v2(randperm(M2),:);
%% intersect
disp('intersect:');
tic
s = size(intersect(v1,v2,'rows'),1);
toc;
s
%% alternative approach
disp('alternative approach:');
tic;
s = compareVecs(v1,v2);
toc;
s
end
function s = compareVecs(v1,v2)
%% create help vector
help_vec = [[v1,zeros(size(v1,1),1)]; ...
[v2,ones(size(v2,1),1)]];
%% sort by first column
% note: for some reason "sortrows(help_vec,1)" is slower
hash_vec = help_vec(:,1); % dummy hash
[~,sidx] = sort(hash_vec);
help_vec = help_vec(sidx,:);
%% diff + compare
help_vec = diff(help_vec);
s = sum(help_vec(:,1) == 0 & ...
help_vec(:,2) == 0 & ...
help_vec(:,3) ~= 0);
end
Result
intersect:
Elapsed time is 0.145717 seconds.
s = 31415
alternative approach:
Elapsed time is 0.048084 seconds.
s = 31415
source to share
Calculate all paired distances using pdist2
, and then count the pairs with zero distance. If the coordinates are floating point values, you can use tolerance instead of comparing to zero:
%// Data: M = 10; N = 8; listM = randi(10,M,2)-1; listN = randi(10,N,2)-1; tol = 1e-6; %// Distance matrix: d = pdist2(listM, listN); %// Count: count = sum(d(:)<tol);
source to share
This should work regardless of the order of the points in each list or their length. This is a hash table / dictionary solution that should be fast but with memory requirements linear in the length of the lists. Please note that the syntax below may not be ideal, but a short reference to the basic data structures mentioned should make a trivial adjustment.
(1) fill the containers.Map dictionary in such a way that the key is a unique function of the dots, eg. num2str(M(i,1))'-'num2str(M(i,2))
...
(2) Then go to all the elements of the second list, create a key in the same way as in (1) and check if it exists. If so, set the map(key)=1
else parameter to 0. At the end, all common dot keys will be stored in 1, and the rest will be zeros.
(3) Complete the summation over the map values (something like sum(map.values())
) which should give you the total number of unique intersections between the two sets, regardless of the order of those points displayed in each list.
OBS: If you don't want to count only unique intersections, but all duplicate points , in (2), instead of creating map(key)=1
, add 1 to map(key)
. The rest is the same.
source to share