Summing data frame sections in R

Question

Summing data frame sections in R

For sample data:

structure(list(id = 1:10, group.id = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 1L), .Label = c("a", "b", "c"), class = "factor"), 
    x = c(2.12, 1.23, 2.36, 4.21, 2.36, NA, 2.36, 4.36, 1.23, 
    2.23), y = c(6.56, 2.36, NA, 4.36, 1.23, 8.56, 4.23, 5.36, 
    2.36, 1.23), z = c(4.36, NA, 5.23, 5.36, 1.23, 4.23, 1.23, 
    NA, 3.26, 2.23), group.x = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), group.y = c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), group.z = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA)), .Names = c("id", "group.id", "x", "y", "z", "group.x", 
"group.y", "group.z"), class = "data.frame", row.names = c(NA, 
-10L))

I want to fill group.x / y / z with the average in the x, y and z columns BY group id.

So, the average of the values in IDs 1,2,3 and 10 is averaged and filled in the corresponding columns "group.x", "group.y" and group.z "This is subsequently done for groups b and c by filling in the rows.

Ideally, I would like an additional table with detailed descriptions of the groups and the number of values and means, so I could estimate how representative the values are. With my basic knowledge of R, I would just subtract the dataframe and average and count for each section, however there must be a better way ... Any ideas?

+3

r

KT_1 May 14 '15 at 15:01

source to share

2 answers

How about dplyr

?

library(dplyr)
df%>%
  group_by(group.id)%>%
  mutate(group.x=mean(x,na.rm=T),
         group.y=mean(y,na.rm=T),
         group.z=mean(z,na.rm=T))

+1

Shenglin chen May 14 '15 at 15:31

source to share

akrun · Accepted Answer · 2015-05-14T15:04:27+0000

We could use data.table

to create new columns with mean

'x', 'y', 'z' value grouped by 'group.id' column. We will convert "data.frame" to "data.table" with setDT(df1)

(or alternatively we can use as.data.table

as suggested by @Ricardo Saporta. One of the advantages is that the original dataset remains the same. I prefer to use setDT

(only subjective)). We don't need to create NA columns in the original dataset.

library(data.table)
setDT(df1)[, paste('group', c('x', 'y', 'z'), sep=".") := 
    lapply(.SD, mean, na.rm=TRUE), group.id, .SDcols=c('x','y','z')]

Assuming we already have NA columns, make sure the class is the same, like "numeric"

setDT(df1)[, 6:8 := lapply(.SD, as.numeric), .SDcols=6:8][, 
   paste('group', c('x', 'y', 'z'), sep=".") := 
   lapply(.SD, mean, na.rm=TRUE), group.id, .SDcols=c('x','y','z')]

Summing data frame sections in R

More articles: