Calculate mean and STD lookup of a dynamic set of columns

Calculate Average and STD for setting a dynamic set of columns.

to show this as an example:

sales <- data.frame(ItemID=c("1A","1B","1C"),
           Jul=c(0,1,5),
           Aug=c(1,2,6),
           Sep=c(0,3,7),
           Oct=c(1,4,8),
           Nov=c(1,4,8),
           Dec=c(1,4,8),
           Jan=c(1,4,8),
           Nmon=c(7,4,6))

      

The above test data gives the table below. I will need to apply the functions on each row based on the value in the "Nmon" column.

  ItemID Jul Aug Sep Oct Nov Dec Jan Nmon
1     1A   0   1   0   1   1   1   1    7
2     1B   1   2   3   4   4   4   4    4
3     1C   5   6   7   8   8   8   8    6

      

eg. the first entry has an Nmon value of 7. Then I need to calculate the mean and standard deviation of all values ​​from July to January (mean value = 0.71, std = 0.49).

In the case of the second record, where the Nmon value is 4, the mean and standard deviation should only be calculated for months starting from Oct-Jan (mean value = 4, std = 0)

Here the number of months will increase or decrease, but the first (position number) and last (Nmon) columns will remain the same. I have a large dataset and need an efficient way to perform these calculations.

+3


source to share


2 answers


Maybe it helps



 t(apply(sales[-1], 1, function(x) {i1 <- length(x)
        x2 <- x[(i1 -x[i1]):(i1-1)]
       c(mean = mean(x2), sd = sd(x2))}))
 #        mean      sd
 #[1,] 0.7142857 0.48795
 #[2,] 4.0000000 0.00000
 #[3,] 7.5000000 0.83666

      

+1


source


Here's another solution with an R base:



sales <- data.frame(ItemID=c("1A","1B","1C"),
                    Jul=c(0,1,5),
                    Aug=c(1,2,6),
                    Sep=c(0,3,7),
                    Oct=c(1,4,8),
                    Nov=c(1,4,8),
                    Dec=c(1,4,8),
                    Jan=c(1,4,8),
                    Nmon=c(7,4,6))
my.m.sd <- function(i) {
  n <- sales$Nmon[i]
  x <- unlist(sales[i, seq(to=8, length.out = n)])
  c(m=mean(x), s=sd(x))
}
sapply(1:3, my.m.sd)
# > sapply(1:3, my.m.sd)
#        [,1] [,2]    [,3]
# m 0.7142857    4 7.50000
# s 0.4879500    0 0.83666

      

+1


source







All Articles