Find unique factors in one variable that are not in another

Question

Find unique factors in one variable that are not in another

I have two variables in R in one dataframe with factor lists, each with some duplication:

v1 <- c("a1","a1","b2","b2","d4","c3","d4")
v2 <- c("a1","c3","d4","d4","e5","f6","g7")
A = data.frame(v1, v2)

The goal is to return every value in v1 that doesn't exist in v2, but only once for each unique value. Based on this thread , I've tried the code below that returns "b2 b2":

A$v1[!A$v1 %in% A$v2]

The actual data I want to use has over 50,000 cases and each value in v1 appears up to 100 times. Using the same% in% function as above truncates after 100 results are returned, but they all have the same meaning due to duplication in v1.

In sum, how can I query the dataframe above and only return the "b2" value once?

+3

r dataframe

ndporter Apr 30 At 15:01

source to share

2 answers

You're almost there. Just wrap it in unique()

:

unique(A$v1[!A$v1 %in% A$v2])

or without combining vectors in the dataframe:

unique(v1[!v1 %in% v2])

If you want to store the results in a new variable:

uni <- unique(A$v1[!A$v1 %in% A$v2])

If you want to reset the levels:

uni <- droplevels(unique(A$v1[!A$v1 %in% A$v2]))

+2

Jaap Apr 30 At 15:04

source to share

Rich scriven · Accepted Answer · 2015-04-30T15:05:37+0000

You may try setdiff()

with(A, setdiff(v1, v2))
# [1] "b2"

Find unique factors in one variable that are not in another

More articles: