Column of unique values ​​between two other columns
sample data:
col1 col2
<NA> cc
a a
ab a
z a
I want to add a column unique
with these values ​​- any values ​​that are not shared between col1 and col2.
col1 col2 unique
<NA> cc cc
a a
ab a b
z a za
I tried to use setdiff
but
(for replication purposes :)
df <- read.table(header=TRUE, stringsAsFactors = FALSE, text =
"col1 col2
NA cc
a a
ab a
z a
")
Like this:
df$unique <- paste0(setdiff(df$col1, df$col2), setdiff(df$col2, df$col1))
But it returns
Error in `$<-.data.frame`(`*tmp*`, "unique", value = c("<NA>cc", "abcc" :
replacement has 2 rows, data has 3
From the error, it looks like it generates a vector of differences between columns, not differences between elements ...
Edit: Added z
and a
sample data on the last line.
+3
source to share
3 answers
Here is a method of length c apply
.
apply(df, 1, function(i) {
i <- i[!is.na(i)] # remove NAs
if(length(i[!is.na(i)]) == 1) i # check length and return singletons untouched
else { # for non-singletons
i <- unlist(strsplit(i, split="")) # strsplit and turn into a vector
i <- i[!(duplicated(i) | duplicated(i, fromLast=TRUE))] # drop duplicates
paste(i, collapse="")}}) # return collapsed singleton set of characters
[1] "cc" "" "b"
Note that for c ("cc", "a", "c") this will return "a" because "cc" and "c" will be marked as duplicates.
+1
source to share