Summing data frame sections in R

For sample data:

structure(list(id = 1:10, group.id = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 1L), .Label = c("a", "b", "c"), class = "factor"), 
    x = c(2.12, 1.23, 2.36, 4.21, 2.36, NA, 2.36, 4.36, 1.23, 
    2.23), y = c(6.56, 2.36, NA, 4.36, 1.23, 8.56, 4.23, 5.36, 
    2.36, 1.23), z = c(4.36, NA, 5.23, 5.36, 1.23, 4.23, 1.23, 
    NA, 3.26, 2.23), group.x = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), group.y = c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), group.z = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA)), .Names = c("id", "group.id", "x", "y", "z", "group.x", 
"group.y", "group.z"), class = "data.frame", row.names = c(NA, 
-10L))

      

I want to fill group.x / y / z with the average in the x, y and z columns BY group id.

So, the average of the values ​​in IDs 1,2,3 and 10 is averaged and filled in the corresponding columns "group.x", "group.y" and group.z "This is subsequently done for groups b and c by filling in the rows.

Ideally, I would like an additional table with detailed descriptions of the groups and the number of values ​​and means, so I could estimate how representative the values ​​are. With my basic knowledge of R, I would just subtract the dataframe and average and count for each section, however there must be a better way ... Any ideas?

+3


source to share


2 answers


We could use data.table

to create new columns with mean

'x', 'y', 'z' value grouped by 'group.id' column. We will convert "data.frame" to "data.table" with setDT(df1)

(or alternatively we can use as.data.table

as suggested by @Ricardo Saporta. One of the advantages is that the original dataset remains the same. I prefer to use setDT

(only subjective)). We don't need to create NA columns in the original dataset.

library(data.table)
setDT(df1)[, paste('group', c('x', 'y', 'z'), sep=".") := 
    lapply(.SD, mean, na.rm=TRUE), group.id, .SDcols=c('x','y','z')]

      



Assuming we already have NA columns, make sure the class is the same, like "numeric"

setDT(df1)[, 6:8 := lapply(.SD, as.numeric), .SDcols=6:8][, 
   paste('group', c('x', 'y', 'z'), sep=".") := 
   lapply(.SD, mean, na.rm=TRUE), group.id, .SDcols=c('x','y','z')]

      

+4


source


How about dplyr

?



library(dplyr)
df%>%
  group_by(group.id)%>%
  mutate(group.x=mean(x,na.rm=T),
         group.y=mean(y,na.rm=T),
         group.z=mean(z,na.rm=T))

      

+1


source







All Articles