Why does the log applied to the magrittr pipe vector give unexpected and incorrect values?

I am trying to calculate the entropy of a discrete distribution and I noticed that the behavior using magrittr is not what I was expecting. To give an example:

> x <- c("A","B","C","A","A","D")                                                                                                 
> table(x)/length(x) %>% log2                                                                                                     
x
        A         B         C         D
 1.1605584 0.3868528 0.3868528 0.3868528

      

What is wrong - logs with values โ€‹โ€‹less than 1 must be negative. If I break down the steps, I get the correct answer:

> freq <- table(x)/length(x)                                                                                                      
> log2(freq)                                                                                                                      
 x
         A         B         C         D
 -1.000000 -2.584963 -2.584963 -2.584963

      

+3


source to share


2 answers


This might work, must be a fan of many pipes :)

library(magrittr)
x %>% table %>% divide_by(x %>% length) %>% log2

      



magrittr

offers also divide_by, multiply_by, etc. You can also skip the package and use the following syntax

x %>% table %>% `/`(x %>% length) %>% log2

      

+2


source


If you have problems with the use of the channel, it is useful to use the basic dplyr

verbs ( select

, mutate

, filter

etc.) to make it more obvious what you are trying to do.



library(tidyverse)
x %>% 
  tbl_df() %>%                   # Convert to a tibble
  group_by(value) %>% 
  summarise(n=n()) %>% 
  mutate(freq = n / sum(n)) %>%  # Calculate frequency
  mutate(log = log2(freq))       # Here log2

      

0


source







All Articles