Find unique factors in one variable that are not in another

I have two variables in R in one dataframe with factor lists, each with some duplication:

v1 <- c("a1","a1","b2","b2","d4","c3","d4")
v2 <- c("a1","c3","d4","d4","e5","f6","g7")
A = data.frame(v1, v2)

      

The goal is to return every value in v1 that doesn't exist in v2, but only once for each unique value. Based on this thread , I've tried the code below that returns "b2 b2":

A$v1[!A$v1 %in% A$v2]

      

The actual data I want to use has over 50,000 cases and each value in v1 appears up to 100 times. Using the same% in% function as above truncates after 100 results are returned, but they all have the same meaning due to duplication in v1.

In sum, how can I query the dataframe above and only return the "b2" value once?

+3


source to share


2 answers


You may try setdiff()



with(A, setdiff(v1, v2))
# [1] "b2"

      

+3


source


You're almost there. Just wrap it in unique()

:

unique(A$v1[!A$v1 %in% A$v2])

      

or without combining vectors in the dataframe:

unique(v1[!v1 %in% v2])

      



If you want to store the results in a new variable:

uni <- unique(A$v1[!A$v1 %in% A$v2]) 

      

If you want to reset the levels:

uni <- droplevels(unique(A$v1[!A$v1 %in% A$v2]))

      

+2


source







All Articles