# Creating a new framework using weighted averages from data in a list

I have a lot of data stored in a list and I want to create weighted averages of them and store the results in a new dataframe. For example, with a list:

```
dfs <- structure(list(df1 = structure(list(A = 4:5, B = c(8L, 4L), Weight = c(TRUE, TRUE), Site = c("X", "X")),
.Names = c("A", "B", "Weight", "Site"), row.names = c(NA, -2L), class = "data.frame"),
df2 = structure(list(A = c(6L, 8L), B = c(9L, 4L), Weight = c(FALSE, TRUE), Site = c("Y", "Y")),
.Names = c("A", "B", "Weight", "Site"), row.names = c(NA, -2L), class = "data.frame")),
.Names = c("df1", "df2"))
```

In this example, I want to use columns `A`

, `B`

and `Weight`

for weighted averages. I also want to move related data like `Site`

, and want to sum the quantity `TRUE`

and `FALSE`

. My desired output would look something like this:

```
result <- structure(list(Site = structure(1:2, .Label = c("X", "Y"), class = "factor"),
A.Weight = c(4.5, 8), B.Weight = c(6L, 4L), Sum.Weight = c(2L,
1L)), .Names = c("Site", "A.Weight", "B.Weight", "Sum.Weight"
), class = "data.frame", row.names = c(NA, -2L))
Site A.Weight B.Weight Sum.Weight
1 X 4.5 6 2
2 Y 8.0 4 1
```

The above example is a very simple example, but my real data contains a lot of data in the list and many more columns than simple `A`

and `B`

for which I want to calculate weighted averages. I also have a few columns similar to `Site`

that are constant in each dataframe and that I want to go to the result.

I can manually calculate the weighted averages using something like

```
weighted.mean(dfs$df1$A, dfs$df1$Weight)
weighted.mean(dfs$df1$B, dfs$df1$Weight)
weighted.mean(dfs$df2$A, dfs$df2$Weight)
weighted.mean(dfs$df2$B, dfs$df2$Weight)
```

but I'm not sure how I can make it shorter, less "manual". Does anyone have any recommendations? I recently found out how `lapply`

through dataframes in the list, but my attempts have not been that great so far.

source to share

The trick is to create a function that works for one data.frame and then use it `lapply`

to iterate over your list. Since it `lapply`

returns a list, we'll use the `do.call`

to `rbind`

result objects together:

```
foo <- function(data, meanCols = LETTERS[1:2], weightCol = "Weight", otherCols = "Site") {
means <- t(sapply(data[, meanCols], weighted.mean, w = data[, weightCol]))
sumWeight <- sum(data[, weightCol])
others <- data[1, otherCols, drop = FALSE] #You said all the other data was constant, so we can just grab first row
out <- data.frame(others, means, sumWeight)
return(out)
}
```

In action:

```
do.call(rbind, lapply(dfs, foo))
---
Site A B sumWeight
df1 X 4.5 6 2
df2 Y 8.0 4 1
```

Since you said it was a minimal example, here's one approach to expanding this for other columns. We will `grepl()`

and will use regular expressions to determine the correct columns. Alternatively, you can write them all into a vector. Something like that:

```
do.call(rbind, lapply(dfs, foo,
meanCols = grepl("A|B", names(dfs[[1]])),
otherCols = grepl("Site", names(dfs[[1]]))
))
```

source to share

through `dplyr`

```
library(dplyr)
library('devtools')
install_github('hadley/tidyr')
library(tidyr)
unnest(dfs) %>%
group_by(Site) %>%
filter(Weight) %>%
mutate(Sum=n()) %>%
select(-Weight) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
```

gives the result

```
# Site A B Sum
#1 X 4.5 6 2
#2 Y 8.0 4 1
```

Or using `data.table`

```
library(data.table)
DT <- rbindlist(dfs)
DT[(Weight)][, c(lapply(.SD, mean, na.rm = TRUE),
Sum=.N), by = Site, .SDcols = c("A", "B")]
# Site A B Sum
#1: X 4.5 6 2
#2: Y 8.0 4 1
```

### Update

In response to @jazzuro's comment using `dplyr 0.3`

, I get

```
unnest(dfs) %>%
group_by(Site) %>%
summarise_each(funs(weighted.mean=stats::weighted.mean(., Weight),
Sum.Weight=sum(Weight)), -starts_with("Weight")) %>%
select(Site:B_weighted.mean, Sum.Weight=A_Sum.Weight)
# Site A_weighted.mean B_weighted.mean Sum.Weight
#1 X 4.5 6 2
#2 Y 8.0 4 1
```

source to share