Calculate mean and STD lookup of a dynamic set of columns
Calculate Average and STD for setting a dynamic set of columns.
to show this as an example:
sales <- data.frame(ItemID=c("1A","1B","1C"),
Jul=c(0,1,5),
Aug=c(1,2,6),
Sep=c(0,3,7),
Oct=c(1,4,8),
Nov=c(1,4,8),
Dec=c(1,4,8),
Jan=c(1,4,8),
Nmon=c(7,4,6))
The above test data gives the table below. I will need to apply the functions on each row based on the value in the "Nmon" column.
ItemID Jul Aug Sep Oct Nov Dec Jan Nmon
1 1A 0 1 0 1 1 1 1 7
2 1B 1 2 3 4 4 4 4 4
3 1C 5 6 7 8 8 8 8 6
eg. the first entry has an Nmon value of 7. Then I need to calculate the mean and standard deviation of all values from July to January (mean value = 0.71, std = 0.49).
In the case of the second record, where the Nmon value is 4, the mean and standard deviation should only be calculated for months starting from Oct-Jan (mean value = 4, std = 0)
Here the number of months will increase or decrease, but the first (position number) and last (Nmon) columns will remain the same. I have a large dataset and need an efficient way to perform these calculations.
source to share
Here's another solution with an R base:
sales <- data.frame(ItemID=c("1A","1B","1C"),
Jul=c(0,1,5),
Aug=c(1,2,6),
Sep=c(0,3,7),
Oct=c(1,4,8),
Nov=c(1,4,8),
Dec=c(1,4,8),
Jan=c(1,4,8),
Nmon=c(7,4,6))
my.m.sd <- function(i) {
n <- sales$Nmon[i]
x <- unlist(sales[i, seq(to=8, length.out = n)])
c(m=mean(x), s=sd(x))
}
sapply(1:3, my.m.sd)
# > sapply(1:3, my.m.sd)
# [,1] [,2] [,3]
# m 0.7142857 4 7.50000
# s 0.4879500 0 0.83666
source to share