The easiest way in R to get a vector of element frequencies in a vector

Suppose I have a vector of v values. The easiest way to get a vector f of length equal to v, where the i-th element of f is the frequency of the i-th element v in v?

The only way I know this seems to be unnecessarily complicated:

v = sample(1:10,100,replace=TRUE)
D = data.frame( idx=1:length(v), v=v )
E = merge( D, data.frame(table(v)) )
E = E[ with(E,order(idx)), ]
f = E$Freq

      

Surely there is an easier way to do this along the "frequency (v)" lines?

+3


source to share


3 answers


For a vector of small natural numbers v

, as in the question, the expression

tabulate(v)[v]

      

is particularly simple as well as fast.

For more general number vectors, v

you can convince to ecdf

help you, as in



w <- sapply(v, ecdf(v)) * length(v)
tabulate(w)[w]

      

It is most likely best to do the coding of the underlying algorithm yourself - and of course it avoids the floating point rounding error implicit in the previous solution:

frequencies <- function(x) {
  i <- order(x)
  v <- x[i]
  w <- cumsum(c(TRUE, v[-1] != v[-length(x)]))
  f <- tabulate(w)[w]
  return(f[order(i)])
}

      

This algorithm sorts the data, assigns sequential IDs 1, 2, 3, ... to values ​​when it encounters them (by summing a binary indicator as values ​​change) uses the previous trick tabulate()[]

to get the frequency efficiently, and then selects the results so that the output matches the input. component by component.

+2


source


Something like this works for me:



sapply(v, function(elmt, vec) sum(vec == elmt), vec=v)

      

+1


source


I think the best solution is here:

ave(v,v,FUN=length)

      

This is simply ave()

to replicate and map the return value FUN()

back to each index of the input Vector whose element was part of the group for which this particular call was made FUN()

.

+1


source







All Articles