Find unique factors in one variable that are not in another
I have two variables in R in one dataframe with factor lists, each with some duplication:
v1 <- c("a1","a1","b2","b2","d4","c3","d4")
v2 <- c("a1","c3","d4","d4","e5","f6","g7")
A = data.frame(v1, v2)
The goal is to return every value in v1 that doesn't exist in v2, but only once for each unique value. Based on this thread , I've tried the code below that returns "b2 b2":
A$v1[!A$v1 %in% A$v2]
The actual data I want to use has over 50,000 cases and each value in v1 appears up to 100 times. Using the same% in% function as above truncates after 100 results are returned, but they all have the same meaning due to duplication in v1.
In sum, how can I query the dataframe above and only return the "b2" value once?
source to share
You're almost there. Just wrap it in unique()
:
unique(A$v1[!A$v1 %in% A$v2])
or without combining vectors in the dataframe:
unique(v1[!v1 %in% v2])
If you want to store the results in a new variable:
uni <- unique(A$v1[!A$v1 %in% A$v2])
If you want to reset the levels:
uni <- droplevels(unique(A$v1[!A$v1 %in% A$v2]))
source to share