R - select elements if the vector contains more (or less)

I need to change the value of elements in a vector. But I only want to change the elements that have less than n instances.

I used this method and Data $ GENE is the vector to be modified.

Data$GENE[which(Data$GENE %in% names(table(Data$GENE)[table(Data$GENE) < 10]))] <<- 'other'

      

A bit confusing, is there a more succinic path?

UPDATE: Response to comments below: This is actually a pretty simple case!

> vec <- c(rep('foo', 5), rep('foo1', 2), rep('foo2', 1), rep('foo3', 3), rep('bar', 6))
> table(vec)
vec
 bar  foo foo1 foo2 foo3 
   6    5    2    1    3 
> vec[which(vec %in% names(table(vec)[table(vec) < 5]))] <- 'other'
> table(vec)
vec
  bar   foo other 
    6     5     6

      

+3


source to share


4 answers


I would just do it in 2 steps so it is less confusing as you say and you only need to compute the table once. Also, you don't need which

to as you are using it in your approach.



y <- table(vec)
vec[vec %in% names(y[y < 5])] <- "other"

      

+3


source


The summary method for factors supports this:



summary(factor(vec),maxsum=sum(table(vec)>=5)+1)
    bar     foo (Other) 
      6       5       6 

      

+5


source


You can do this easily with data.table.

library(data.table)
data(mtcars)
setDT(mtcars, keep.rownames = T)  # set data.frame as data.table

# add a count column with .N, then chain with [count < ...]
mtcars[, count := .N, by = cyl][count < 14]

      

+2


source


I think that what you are describing can be accomplished with the help ave

in the base R. Here we replace these observations with less than three observations.

vec[ave(seq_along(vec), vec, FUN=length) < 5] <- "other"
vec

      

We can wrap this in a friendly function

haslessthan <- function(x, n) ave(seq_along(x), x, FUN=length) < n
vec[haslessthan(vec, 5)] <- "other"

      

Anyway the result is

vec
  bar   foo other 
    6     5     6 

      

+2


source







All Articles