Why does the log applied to the magrittr pipe vector give unexpected and incorrect values?
I am trying to calculate the entropy of a discrete distribution and I noticed that the behavior using magrittr is not what I was expecting. To give an example:
> x <- c("A","B","C","A","A","D")
> table(x)/length(x) %>% log2
x
A B C D
1.1605584 0.3868528 0.3868528 0.3868528
What is wrong - logs with values โโless than 1 must be negative. If I break down the steps, I get the correct answer:
> freq <- table(x)/length(x)
> log2(freq)
x
A B C D
-1.000000 -2.584963 -2.584963 -2.584963
+3
source to share
2 answers
If you have problems with the use of the channel, it is useful to use the basic dplyr
verbs ( select
, mutate
, filter
etc.) to make it more obvious what you are trying to do.
library(tidyverse)
x %>%
tbl_df() %>% # Convert to a tibble
group_by(value) %>%
summarise(n=n()) %>%
mutate(freq = n / sum(n)) %>% # Calculate frequency
mutate(log = log2(freq)) # Here log2
0
source to share