MATLAB - search and number of duplicates in an array
I have an array of values, some of which have duplicates, for example:
a = [5;5;4;7;7;3;3;9;5;7]
and I would like to find duplicates and then list each one sequentially, making the null duplicates zeros. For example:
b = [1;1;0;2;2;3;3;0;1;2]
I currently have a very inefficient and incomplete approach using function unique
and various loops for
and if
, but feel like there should be a simple answer.
What's the most efficient way to get this answer?
source to share
Here are two liners that will also work for non-sequential duplicates
[c, ia, ic] = unique(a, 'stable');
[~, b] = ismember(a, a(ia(accumarray(ic,1)>1)));
I used some ideas from @ excaza's answer with modifications.
source to share
You can use a combination unique
, accumarray
and ismember
make the necessary adjustments:
a = [5;5;4;7;7;3;3;9]; % Identify unique values and their counts [uniquevals, ~, ia] = unique(a, 'stable'); % Stable keeps it in the same order bincounts = accumarray(ia, 1); % Count the frequency of each index in ia % Zero out singles singles = uniquevals(bincounts <= 1); [~, singleidx] = intersect(a, singles); a(singleidx) = 0; % Overwrite repeats repeats = uniquevals(bincounts > 1); [~, a] = ismember(a, repeats);
Which returns a new one a
from:
a = 1 1 0 2 2 3 3 0
Step by step guide
We use unique
here to find all the unique values in our input array a
. We also store an additional third output, which is a mapping of values a
to their index in an array of unique values. Note that we are using a parameter stable
to retrieve the unique values โโin the order they were first encountered in a
; results unique
are sorted by default.
We then use accumarray
to accumulate the indices obtained from unique
, which gives us the score of each index. When using boolean indexing , we first use these counts to zero out single instances. After clearing them, we can abuse the use of the second pin ismember
to return the final answer.
source to share
Here is a solution based on indexing, boolean operators and cumsum :
x = [false; a(2:end)==a(1:end-1)]; %logical indexes of repeated elements except the first element of each block
y = [x(2:end)|x(1:end-1) ;x(end)]; %logical indexes of repeated elements
result = cumsum(~x&y).*y %cumsum(...):number all elements sequentially and (... .* y): making non-duplicates zero
Edit:
As an edited question, to manipulate inconsistent duplicates, you can do this:
[s ii] = sort(a);
x = [false ;s(2:end)==s(1:end-1)];
y = [x(2:end)|x(1:end-1) ;x(end)];
first = ~x&y;
[~,ix]=sort(ii(first));
un(ix,1)=1:numel(ix);
result(ii,1)=un(cumsum(first)).*y;
source to share
Here's a different approach:
a = [5;5;4;7;7;3;3;9;5;7]; [u, ~, w] = unique(a, 'stable'); s = find(sum(bsxfun(@eq, a, u.'), 1) > 1); b = sum(bsxfun(@times, bsxfun(@eq, w, s), 1:numel(s)), 2);
In R2016b onwards, you can simplify the syntax:
a = [5;5;4;7;7;3;3;9;5;7]; [u, ~, w] = unique(a, 'stable'); s = find(sum(a==u.', 1) > 1); b = sum((w==s).*(1:numel(s)), 2);
source to share