Collapse data file by name where whole columns are summed and factor columns inserted

Question

Collapse data file by name where whole columns are summed and factor columns inserted

I am trying to fold a dataframe using unique names that contain both optional integers and factor columns. The integer column must be summed for unique "Name" values, and the factor column must have all the values inserted next to each other as follows:

Name        Number         Location
RUDU          6               SiteA
RUDU          4               SiteB
YHBL          50              SiteA
YHBL          60              SiteB

Collapsed by name:

Name        Number         Location
RUDU          10              SiteA,SiteB
YHBL          110             SiteA,SiteB

I've played around a bit with ddplyr, which works for the whole part, but I don't understand how to aggregate the factorial part of my query.

+3

r

user1658170 01 june 15 at 17:35

source to share

3 answers

dplyr

library(dplyr)
d %>% 
    group_by_(~Name) %>% 
    summarize_(Number=~sum(Number), Location=~paste(Location, collapse=','))

base R

merge(aggregate(Number ~ Name, data=d, FUN=sum), aggregate(Location ~ Name, data=d, FUN=paste, collapse=','))

+2

Matthew plourde 01 june 15 at 17:45

source to share

Two more methods for reference.

Function: tapply ()

data.frame(
  Number = with(df1, tapply(Number, Name, sum)),
  Location = with(df1, tapply(Location, Name, toString))                  
)

#      Number     Location
# RUDU     10 SiteA, SiteB
# YHBL    110 SiteA, SiteB

Function: by ()

data.frame(cbind(
  Number = with(df1, by(Number, Name, sum)),
  Location =  with(df1, by(Location, Name, toString))
  )
)

#      Number     Location
# RUDU     10 SiteA, SiteB
# YHBL    110 SiteA, SiteB

Data

# df1 <- read.table(text='Name        Number         Location
# RUDU          6               SiteA
# RUDU          4               SiteB
# YHBL          50              SiteA
# YHBL          60              SiteB', header=T)

+1

Pierre lafortune 01 june 15 at 18:16

source to share

David Arenburg · Accepted Answer · 2015-06-01T17:41:46+0000

An approach is possible here data.table

library(data.table)
setDT(df)[, list(Mumber = sum(Number), Location = toString(Location)), by = Name]
#    Name Mumber     Location
# 1: RUDU     10 SiteA, SiteB
# 2: YHBL    110 SiteA, SiteB

And as you mentioned plyr

, here's a dplyr

possible solution

library(dplyr)
df %>%
  group_by(Name) %>%
  summarise(
            Mumber = sum(Number), 
            Location = toString(Location)
            )

# Source: local data table [2 x 3]
# 
#   Name Mumber     Location
# 1 RUDU     10 SiteA, SiteB
# 2 YHBL    110 SiteA, SiteB

Collapse data file by name where whole columns are summed and factor columns inserted

More articles: