R - select elements if the vector contains more (or less)
I need to change the value of elements in a vector. But I only want to change the elements that have less than n instances.
I used this method and Data $ GENE is the vector to be modified.
Data$GENE[which(Data$GENE %in% names(table(Data$GENE)[table(Data$GENE) < 10]))] <<- 'other'
A bit confusing, is there a more succinic path?
UPDATE: Response to comments below: This is actually a pretty simple case!
> vec <- c(rep('foo', 5), rep('foo1', 2), rep('foo2', 1), rep('foo3', 3), rep('bar', 6))
> table(vec)
vec
bar foo foo1 foo2 foo3
6 5 2 1 3
> vec[which(vec %in% names(table(vec)[table(vec) < 5]))] <- 'other'
> table(vec)
vec
bar foo other
6 5 6
+3
source to share
4 answers
I think that what you are describing can be accomplished with the help ave
in the base R. Here we replace these observations with less than three observations.
vec[ave(seq_along(vec), vec, FUN=length) < 5] <- "other"
vec
We can wrap this in a friendly function
haslessthan <- function(x, n) ave(seq_along(x), x, FUN=length) < n
vec[haslessthan(vec, 5)] <- "other"
Anyway the result is
vec
bar foo other
6 5 6
+2
source to share