R: remove duplicate values in different rows and columns

Question

R: remove duplicate values in different rows and columns

I found many pages about finding duplicate items in a list or duplicate rows in a dataframe. However, I want to search for duplicated elements throughout the dataframe. Let's take this as an example:

df
     coupon1    coupon2    coupon3
1         10         11         12
2         13         16         15
3         16         17         18
4         19         20         21
5         22         23         24
6         25         26         27

You will notice that df [2,2] and df [3,1] have the same element (16). When I ran

duplicated(df)

It returns six "FALSE" because the entire string is not duplicated, only one element. How do I check for any duplicate values in the entire dataframe? I would like to know that a duplicate exists and also to know its value (and the same if there are multiple duplicates).

+3

r duplicates

Kira tebbe 07 jul. 15 at 18:25

source to share

2 answers

which(duplicated(stack(yourdf)[,1]))
[1] 8
stack(yourdf)[,1][which(duplicated(stack(yourdf)[,1]))]
[1] 16

+1

user227710 07 jul. 15 at 18:34

source to share

Pierre lafortune · Accepted Answer · 2015-07-07T18:33:39+0000

This will find global spoofs, but it will search by column. So, (3,1) will still be FALSE since it is the first value 16

in the data frame.

m <- matrix(duplicated(unlist(df)), ncol=ncol(df))
#      [,1]  [,2]  [,3]
#[1,] FALSE FALSE FALSE
#[2,] FALSE  TRUE FALSE
#[3,] FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE

Then you can use it but want, for example:

df[m]
#[1] 16

R: remove duplicate values ​​in different rows and columns

More articles:

R: remove duplicate values in different rows and columns