Collapse data file by name where whole columns are summed and factor columns inserted
I am trying to fold a dataframe using unique names that contain both optional integers and factor columns. The integer column must be summed for unique "Name" values, and the factor column must have all the values ββinserted next to each other as follows:
Name Number Location
RUDU 6 SiteA
RUDU 4 SiteB
YHBL 50 SiteA
YHBL 60 SiteB
Collapsed by name:
Name Number Location
RUDU 10 SiteA,SiteB
YHBL 110 SiteA,SiteB
I've played around a bit with ddplyr, which works for the whole part, but I don't understand how to aggregate the factorial part of my query.
+3
source to share
3 answers
An approach is possible here data.table
library(data.table)
setDT(df)[, list(Mumber = sum(Number), Location = toString(Location)), by = Name]
# Name Mumber Location
# 1: RUDU 10 SiteA, SiteB
# 2: YHBL 110 SiteA, SiteB
And as you mentioned plyr
, here's a dplyr
possible solution
library(dplyr)
df %>%
group_by(Name) %>%
summarise(
Mumber = sum(Number),
Location = toString(Location)
)
# Source: local data table [2 x 3]
#
# Name Mumber Location
# 1 RUDU 10 SiteA, SiteB
# 2 YHBL 110 SiteA, SiteB
+6
source to share
Two more methods for reference.
Function: tapply ()
data.frame(
Number = with(df1, tapply(Number, Name, sum)),
Location = with(df1, tapply(Location, Name, toString))
)
# Number Location
# RUDU 10 SiteA, SiteB
# YHBL 110 SiteA, SiteB
Function: by ()
data.frame(cbind(
Number = with(df1, by(Number, Name, sum)),
Location = with(df1, by(Location, Name, toString))
)
)
# Number Location
# RUDU 10 SiteA, SiteB
# YHBL 110 SiteA, SiteB
Data
# df1 <- read.table(text='Name Number Location
# RUDU 6 SiteA
# RUDU 4 SiteB
# YHBL 50 SiteA
# YHBL 60 SiteB', header=T)
+1
source to share