MATLAB - search and number of duplicates in an array

I have an array of values, some of which have duplicates, for example:

a = [5;5;4;7;7;3;3;9;5;7]

      

and I would like to find duplicates and then list each one sequentially, making the null duplicates zeros. For example:

b = [1;1;0;2;2;3;3;0;1;2]

      

I currently have a very inefficient and incomplete approach using function unique

and various loops for

and if

, but feel like there should be a simple answer.

What's the most efficient way to get this answer?

+3


source to share


4 answers


Here are two liners that will also work for non-sequential duplicates

[c, ia, ic] = unique(a, 'stable');
[~, b] = ismember(a, a(ia(accumarray(ic,1)>1)));

      



I used some ideas from @ excaza's answer with modifications.

+1


source


You can use a combination unique

, accumarray

and ismember

make the necessary adjustments:

a = [5;5;4;7;7;3;3;9];

% Identify unique values and their counts
[uniquevals, ~, ia] = unique(a, 'stable');  % Stable keeps it in the same order
bincounts = accumarray(ia, 1);  % Count the frequency of each index in ia

% Zero out singles
singles = uniquevals(bincounts <= 1);
[~, singleidx] = intersect(a, singles);
a(singleidx) = 0;

% Overwrite repeats
repeats = uniquevals(bincounts > 1);
[~, a] = ismember(a, repeats);

      

Which returns a new one a

from:

a =

     1     1     0     2     2     3     3     0

      




Step by step guide

We use unique

here to find all the unique values in our input array a

. We also store an additional third output, which is a mapping of values a

to their index in an array of unique values. Note that we are using a parameter stable

to retrieve the unique values โ€‹โ€‹in the order they were first encountered in a

; results unique

are sorted by default.

We then use accumarray

to accumulate the indices obtained from unique

, which gives us the score of each index. When using boolean indexing , we first use these counts to zero out single instances. After clearing them, we can abuse the use of the second pin ismember

to return the final answer.

+2


source


Here is a solution based on indexing, boolean operators and cumsum :

x = [false; a(2:end)==a(1:end-1)]; %logical indexes of repeated elements except the first element of each block 
y = [x(2:end)|x(1:end-1) ;x(end)]; %logical indexes of repeated elements
result = cumsum(~x&y).*y           %cumsum(...):number all elements sequentially and (... .* y): making non-duplicates zero

      

Edit:

As an edited question, to manipulate inconsistent duplicates, you can do this:

[s ii] = sort(a);
x = [false ;s(2:end)==s(1:end-1)];
y = [x(2:end)|x(1:end-1) ;x(end)];
first = ~x&y;
[~,ix]=sort(ii(first));
un(ix,1)=1:numel(ix);
result(ii,1)=un(cumsum(first)).*y;

      

+2


source


Here's a different approach:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(bsxfun(@eq, a, u.'), 1) > 1);
b = sum(bsxfun(@times, bsxfun(@eq, w, s), 1:numel(s)), 2);

      

In R2016b onwards, you can simplify the syntax:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(a==u.', 1) > 1);
b = sum((w==s).*(1:numel(s)), 2);

      

+2


source







All Articles