How to join lists of data that have been split using the plyr function
I used the strip_splits (df) function provided by the plyr package to get a list of dataframes . Now I want to concatenate a list of data frames and add back to the variables used to separate them. The documentation below makes me think it should be possible, however I cannot find a related function.
This is useful when you want to perform some kind of operation on every column in the dataframe, except for the variables that you used to split. These variables will be automatically added to the result when you combine all the results together.
Example:
dfSplit <- dlply(mtcars, c("vs", "am"), strip_splits)
df <- dfSplit[[1]]
score <- function(df) {
df$score <- apply(apply(df, 2, scale), 1, mean, na.rm = TRUE)
return(df)
}
dfSplit <- lapply(dfSplit, score)
How do I concatenate the data frames in a dfSplit list together?
Edit: The merged dataframe must have vs and am columns
source to share
Using bind_rows()
from dplyr
:
library(dplyr) bind_rows(dfSplit)
Or using the R base:
do.call(rbind, dfSplit)
What gives:
#Source: local data frame [32 x 10]
#
# mpg cyl disp hp drat wt qsec gear carb score
#1 18.7 8 360.0 175 3.15 3.440 17.02 3 2 -0.18850120
#2 14.3 8 360.0 245 3.21 3.570 15.84 3 4 0.05315376
#3 16.4 8 275.8 180 3.07 4.070 17.40 3 3 -0.15909455
#4 17.3 8 275.8 180 3.07 3.730 17.60 3 3 -0.14033030
#5 15.2 8 275.8 180 3.07 3.780 18.00 3 3 -0.16788329
#6 10.4 8 472.0 205 2.93 5.250 17.98 3 4 0.42384103
#7 10.4 8 460.0 215 3.00 5.424 17.82 3 4 0.49006288
#8 14.7 8 440.0 230 3.23 5.345 17.42 3 4 0.79264565
#9 15.5 8 318.0 150 2.76 3.520 16.87 3 2 -0.79767163
#10 15.2 8 304.0 150 3.15 3.435 17.30 3 2 -0.53819495
#.. ... ... ... ... ... ... ... ... ... ...
source to share
I have since found the plyr ldply function which gives
.id mpg cyl disp hp drat wt qsec gear carb score
1 0.0 18.7 8 360.0 175 3.15 3.440 17.02 3 2 -0.18850120
2 0.0 14.3 8 360.0 245 3.21 3.570 15.84 3 4 0.05315376
3 0.0 16.4 8 275.8 180 3.07 4.070 17.40 3 3 -0.15909455
4 0.0 17.3 8 275.8 180 3.07 3.730 17.60 3 3 -0.14033030
5 0.0 15.2 8 275.8 180 3.07 3.780 18.00 3 3 -0.16788329
however the documentation leads me to think that there must be a function that gives a data frame with vs and am columns (not .id)
source to share