Aggregation of multiple subtotals?
Is there a way to aggregate multiple sub-calculations using reshape2
? For example. for aviation security dataset
require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
aqm <- subset(aqm, month %in% 5:6 & day %in% 1:7)
I can make an intermediate column for each month that has the average of all variables for that month:
dcast(aqm, day ~ month+variable, mean, margins = "variable")
day 5_ozone 5_solar.r 5_wind 5_temp 5_(all) 6_ozone 6_solar.r
1 1 41 190 7.4 67 76.350 NaN 286
2 2 36 118 8.0 72 58.500 NaN 287
3 3 12 149 12.6 74 61.900 NaN 242
4 4 18 313 11.5 62 101.125 NaN 186
5 5 NaN NaN 14.3 56 35.150 NaN 220
6 6 28 NaN 14.9 66 36.300 NaN 264
7 7 23 299 8.6 65 98.900 29 127
6_wind 6_temp 6_(all)
1 8.6 78 124.20000
2 9.7 74 123.56667
3 16.1 67 108.36667
4 9.2 84 93.06667
5 8.6 85 104.53333
6 14.3 79 119.10000
7 9.7 82 61.92500
I can also make an intermediate column for each variable that has the average for all months inside that variable:
dcast(aqm, day ~ variable+month, mean, margins = "month")
day ozone_5 ozone_6 ozone_(all) solar.r_5 solar.r_6 solar.r_(all)
1 1 41 NaN 41 190 286 238.0
2 2 36 NaN 36 118 287 202.5
3 3 12 NaN 12 149 242 195.5
4 4 18 NaN 18 313 186 249.5
5 5 NaN NaN NaN NaN 220 220.0
6 6 28 NaN 28 NaN 264 264.0
7 7 23 29 26 299 127 213.0
wind_5 wind_6 wind_(all) temp_5 temp_6 temp_(all)
1 7.4 8.6 8.00 67 78 72.5
2 8.0 9.7 8.85 72 74 73.0
3 12.6 16.1 14.35 74 67 70.5
4 11.5 9.2 10.35 62 84 73.0
5 14.3 8.6 11.45 56 85 70.5
6 14.9 14.3 14.60 66 79 72.5
7 8.6 9.7 9.15 65 82 73.5
Is there a way to say to reshape2
calculate both sets of subtotals in one command? This command is close, adds to the grand total, but omits the monthly subtotals:
dcast(aqm, day ~ variable+month, mean, margins = c("variable", "month"))
source to share
If I am asking the question correctly, you can use
acast(aqm, day ~ variable ~ month, mean, margins = c("variable", "month"))[,,'(all)']
acast
gets a summary for each day for each variable for each month. The full cumulative "slice" ([, '(all)']) has a row for each day, with a column for each variable (averaged over all months) and a column "(all)" averaging each day across all variables during all months.
This is what you need?
source to share