MATLAB - search and number of duplicates in an array

Question

MATLAB - search and number of duplicates in an array

I have an array of values, some of which have duplicates, for example:

a = [5;5;4;7;7;3;3;9;5;7]

and I would like to find duplicates and then list each one sequentially, making the null duplicates zeros. For example:

b = [1;1;0;2;2;3;3;0;1;2]

I currently have a very inefficient and incomplete approach using function unique

and various loops for

and if

, but feel like there should be a simple answer.

What's the most efficient way to get this answer?

+3

matlab unique

user3743235 June 15. 17 at 16:20

source to share

4 answers

You can use a combination unique

, accumarray

and ismember

make the necessary adjustments:

a = [5;5;4;7;7;3;3;9];

% Identify unique values and their counts
[uniquevals, ~, ia] = unique(a, 'stable');  % Stable keeps it in the same order
bincounts = accumarray(ia, 1);  % Count the frequency of each index in ia

% Zero out singles
singles = uniquevals(bincounts <= 1);
[~, singleidx] = intersect(a, singles);
a(singleidx) = 0;

% Overwrite repeats
repeats = uniquevals(bincounts > 1);
[~, a] = ismember(a, repeats);

Which returns a new one a

from:

a =

     1     1     0     2     2     3     3     0

Step by step guide

We use unique

here to find all the unique values in our input array a

. We also store an additional third output, which is a mapping of values a

to their index in an array of unique values. Note that we are using a parameter stable

to retrieve the unique values in the order they were first encountered in a

; results unique

are sorted by default.

We then use accumarray

to accumulate the indices obtained from unique

, which gives us the score of each index. When using boolean indexing , we first use these counts to zero out single instances. After clearing them, we can ~~abuse the~~ use of the second pin ismember

to return the final answer.

+2

excaza June 15. 17 at 16:46

source to share

Here is a solution based on indexing, boolean operators and cumsum :

x = [false; a(2:end)==a(1:end-1)]; %logical indexes of repeated elements except the first element of each block 
y = [x(2:end)|x(1:end-1) ;x(end)]; %logical indexes of repeated elements
result = cumsum(~x&y).*y           %cumsum(...):number all elements sequentially and (... .* y): making non-duplicates zero

Edit:

As an edited question, to manipulate inconsistent duplicates, you can do this:

[s ii] = sort(a);
x = [false ;s(2:end)==s(1:end-1)];
y = [x(2:end)|x(1:end-1) ;x(end)];
first = ~x&y;
[~,ix]=sort(ii(first));
un(ix,1)=1:numel(ix);
result(ii,1)=un(cumsum(first)).*y;

+2

rahnema1 June 15. 17 at 17:17

source to share

Here's a different approach:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(bsxfun(@eq, a, u.'), 1) > 1);
b = sum(bsxfun(@times, bsxfun(@eq, w, s), 1:numel(s)), 2);

In R2016b onwards, you can simplify the syntax:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(a==u.', 1) > 1);
b = sum((w==s).*(1:numel(s)), 2);

+2

Luis mendo June 16 17 at 9:09

source to share

Vahe Tshitoyan · Accepted Answer · 2017-06-15T19:53:40+0000

Here are two liners that will also work for non-sequential duplicates

[c, ia, ic] = unique(a, 'stable');
[~, b] = ismember(a, a(ia(accumarray(ic,1)>1)));

I used some ideas from @ excaza's answer with modifications.

MATLAB - search and number of duplicates in an array

Step by step guide

More articles: