The easiest way in R to get a vector of element frequencies in a vector
Suppose I have a vector of v values. The easiest way to get a vector f of length equal to v, where the i-th element of f is the frequency of the i-th element v in v?
The only way I know this seems to be unnecessarily complicated:
v = sample(1:10,100,replace=TRUE)
D = data.frame( idx=1:length(v), v=v )
E = merge( D, data.frame(table(v)) )
E = E[ with(E,order(idx)), ]
f = E$Freq
Surely there is an easier way to do this along the "frequency (v)" lines?
source to share
For a vector of small natural numbers v
, as in the question, the expression
tabulate(v)[v]
is particularly simple as well as fast.
For more general number vectors, v
you can convince to ecdf
help you, as in
w <- sapply(v, ecdf(v)) * length(v)
tabulate(w)[w]
It is most likely best to do the coding of the underlying algorithm yourself - and of course it avoids the floating point rounding error implicit in the previous solution:
frequencies <- function(x) {
i <- order(x)
v <- x[i]
w <- cumsum(c(TRUE, v[-1] != v[-length(x)]))
f <- tabulate(w)[w]
return(f[order(i)])
}
This algorithm sorts the data, assigns sequential IDs 1, 2, 3, ... to values ββwhen it encounters them (by summing a binary indicator as values ββchange) uses the previous trick tabulate()[]
to get the frequency efficiently, and then selects the results so that the output matches the input. component by component.
source to share