# Indexing duplicates in a matrix: Matlab

Consider the matrix

`````` X = [ 1 2 0 1;
1 0 1 2;
1 2 3 4;
2 4 6 8;
.
.
1 2 0 1
.
.    ]
```

```

I want to create a new column so that I can type the `ith`

occurrence of each row.

Ans:

``````   X = [ 1 2 0 1;   y =  [1
1 0 1 2;         1
1 2 3 4;         1
2 4 6 8;         1
.             .
.             .
1 2 0 1          2
.             .
.    ]        .]
```

```

Any ideas?

+3

source to share

``````y = sum(triu(squareform(pdist(X))==0)).';
```

```

It works by counting how many previous lines are equal to each line. Two lines are equal if their distance (calculated with `squareform`

and `pdist`

) is 0. `triu`

ensures that only the previous lines are counted.

To shorten the computation time and avoid depending on the statistics toolbar, you can use @ user1735003's suggestion:

``````y = sum(triu((bsxfun(@plus, sum(X.^2,2), sum(X.^2,2)') - 2*X*X.')==0));
```

```
+3

source

Approach # 1

``````%// unique rows
unqrows = unique(X,'rows');

%// matches for each row against the unique rows and their cumsum values
matches_perunqrow = squeeze(all(bsxfun(@eq,X,permute(unqrows,[3 2 1])),2));
cumsum_unqrows = cumsum(matches_perunqrow,1);

%// Go through a row-order and get the cumsum values for the final output
[row,col] = find(matches_perunqrow);
[sorted_row,ind] = sort(row);
y=cumsum_unqrows(sub2ind(size(cumsum_unqrows),[1:size(cumsum_unqrows,1)]',col(ind)));
```

```

Example run -

``````X =
1     2     0     1
1     0     1     2
1     2     3     4
2     4     6     8
1     2     0     1
1     2     3     4
1     2     3     4
1     2     3     4
1     2     3     4
1     2     0     1
out =
1
1
1
1
2
2
3
4
5
3
```

```

Approach # 2

``````%// unique rows
unqrows = unique(X,'rows');

%// matches for each row against the unique rows
matches_perunqrow = all(bsxfun(@eq,X,permute(unqrows,[3 2 1])),2)

%// Get the cumsum of matches and select only the matches for each row.
%// Since we need to go through a row-order, transpose the result
cumsum_perrow = squeeze(cumsum(matches_perunqrow,1).*matches_perunqrow)' %//'

%// Select the non zero values for the final output
y = cumsum_perrow(cumsum_perrow~=0)
```

```

Approach # 3

``````%// label each row based on their uniqueness
[~,~,v3] = unique(X,'rows')
matches_perunqrow = bsxfun(@eq,v3,1:size(X,1))

cumsum_unqrows = cumsum(matches_perunqrow,1);

%// Go through a row-order and get the cumsum values for the final output
[row,col] = find(matches_perunqrow);
[sorted_row,ind] = sort(row);
y=cumsum_unqrows(sub2ind(size(cumsum_unqrows),[1:size(cumsum_unqrows,1)]',col(ind)));
```

```

Approach # 4

``````%// label each row based on their uniqueness
[~,~,match_row_id] = unique(X,'rows');

%// matches for each row against the unique rows and their cumsum values
matches_perunqrow = bsxfun(@eq,match_row_id',[1:size(X,1)]');
cumsum_unqrows = cumsum(matches_perunqrow,2);

%// Select the cumsum values for the ouput based on the unique matches for each row
y = cumsum_unqrows(matches_perunqrow);
```

```
+2

source

A solution involving a for loop can be done quite easily, it might be fast enough already. I'm sure there is a faster solution that you can use `cumsum`

, but you might not even need to. Basic idea: Find the indices of unique strings first, so that you can handle scalar indices instead of full strings (vectors). Then collapse by indices and find the number of previous occurrences:

``````X = [ 1 2 0 1;
1 0 1 2;
1 2 3 4;
2 4 6 8;
1 2 0 1;
1 3 3 7;
1 2 0 1];

[~,~,idx] = unique(X, 'rows'); %// find unique rows

%// loop over indices and accumulate number of previous occurences
y = zeros(size(idx));
for i = 1:length(idx)
y(i) = sum(idx(1:i) == idx(i)); %// this line probably scales horrible with length of idx.
end
```

```

Result for example:

``````y =

1
1
1
1
2
1
3
```

```
+1

source

All Articles