Collapse data file by name where whole columns are summed and factor columns inserted

I am trying to fold a dataframe using unique names that contain both optional integers and factor columns. The integer column must be summed for unique "Name" values, and the factor column must have all the values ​​inserted next to each other as follows:

Name        Number         Location
RUDU          6               SiteA
RUDU          4               SiteB
YHBL          50              SiteA
YHBL          60              SiteB

      

Collapsed by name:

Name        Number         Location
RUDU          10              SiteA,SiteB
YHBL          110             SiteA,SiteB

      

I've played around a bit with ddplyr, which works for the whole part, but I don't understand how to aggregate the factorial part of my query.

+3


source to share


3 answers


An approach is possible here data.table

library(data.table)
setDT(df)[, list(Mumber = sum(Number), Location = toString(Location)), by = Name]
#    Name Mumber     Location
# 1: RUDU     10 SiteA, SiteB
# 2: YHBL    110 SiteA, SiteB

      



And as you mentioned plyr

, here's a dplyr

possible solution

library(dplyr)
df %>%
  group_by(Name) %>%
  summarise(
            Mumber = sum(Number), 
            Location = toString(Location)
            )

# Source: local data table [2 x 3]
# 
#   Name Mumber     Location
# 1 RUDU     10 SiteA, SiteB
# 2 YHBL    110 SiteA, SiteB

      

+6


source


dplyr

library(dplyr)
d %>% 
    group_by_(~Name) %>% 
    summarize_(Number=~sum(Number), Location=~paste(Location, collapse=','))

      



base R

merge(aggregate(Number ~ Name, data=d, FUN=sum), aggregate(Location ~ Name, data=d, FUN=paste, collapse=','))

      

+2


source


Two more methods for reference.

Function: tapply ()

data.frame(
  Number = with(df1, tapply(Number, Name, sum)),
  Location = with(df1, tapply(Location, Name, toString))                  
)

#      Number     Location
# RUDU     10 SiteA, SiteB
# YHBL    110 SiteA, SiteB

      

Function: by ()

data.frame(cbind(
  Number = with(df1, by(Number, Name, sum)),
  Location =  with(df1, by(Location, Name, toString))
  )
)

#      Number     Location
# RUDU     10 SiteA, SiteB
# YHBL    110 SiteA, SiteB

      

Data

# df1 <- read.table(text='Name        Number         Location
# RUDU          6               SiteA
# RUDU          4               SiteB
# YHBL          50              SiteA
# YHBL          60              SiteB', header=T)

      

+1


source







All Articles