How to use coef () to output do () from dplyr

My question almost answers in dplyr 0.3.0.9000 how to use do () correctly , but not quite.

I have some data that looks like this:

> head(myData)
   Sequence Index  xSamples ySamples
6         0     5 0.3316187 3.244171
7         0     6 1.5131778 2.719893
8         0     7 1.9088933 3.122991
9         0     8 2.7940244 3.616815
10        0     9 3.6500311 3.519641

      

This sequence actually ranges from 0 to 9999. In each sequence, both xSamples and ySamples must be linear with respect to the Index. The plan is to group myData by sequence and then use it lm()

through do()

for each group. The code looks something like this (shamelessly removed from help):

library(dplyr)
myData_by_sequence <- group_by(myData, Sequence)
models <- myData_by_sequence %>% do(mod = lm(xSamples ~ Index, data = .))

      

It works, but I get the result ...

> head(models)
Source: local data frame [10000 x 2]

  Sequence     mod
1        0 <S3:lm>
2        1 <S3:lm>
3        2 <S3:lm>
4        3 <S3:lm>
5        4 <S3:lm>
6        5 <S3:lm>

      

., and the data I want is stuck in this second column. I have a working solution plyr

that looks like this ...

models <- dlply(myData, "Sequence", function(df) lm(xSamples ~ Index, data = df))
xresult <- ldply(models, coef)

      

. and it gives me results broken down into data structure thanks to coef()

. Trap: I can't mix dplyr (which I usually use and love) with plyr, and I can't get coef()

that second column from dplyr's output to work.

I've tried several other approaches like trying steps coef()

and lm()

and I can split the second column into a list of linear models, but I can't use the do()

on list.

It really seems to me that there is something obvious that I am missing here. R is definitely not my primary language. Any help would be appreciated.

edit Tried it.,

result <-
    rects %>% 
    group_by(Sequence) %>% 
    do(data.frame(Coef = coef(lm(xSamples ~ Frame, data = .))))

      

., and get something very close, but with the coefficients stacked in one column:

  Sequence       Coef
1        0 -5.0189823
2        0  1.0004240
3        1 -4.9411745
4        1  0.9981858

      

+3


source to share


1 answer


Try

library(dplyr) 
myData %>%
      group_by(Sequence) %>%
      do(data.frame(setNames(as.list(coef(lm(xSamples~Index, data=.))),
                 c('Intercept', 'Index')))
#    Sequence Intercept     Index
#1        0 -3.502821 0.7917671
#2        1  3.071611 0.3226020

      

Or using data.table



 library(data.table)
 setDT(myData)[, as.list(coef(lm(xSamples~Index))) , by = Sequence]
 #   Sequence (Intercept)     Index
 #1:        0   -3.502821 0.7917671
 #2:        1    3.071611 0.3226020

      

data

 myData <- structure(list(Sequence = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
 1L, 1L), Index = c(5L, 6L, 7L, 8L, 9L, 15L, 6L, 9L, 6L, 10L),
 xSamples = c(0.3316187, 
 1.5131778, 1.9088933, 2.7940244, 3.6500311, 7.3316187, 4.5131778, 
 9.9088933, 3.7940244, 4.6500311), ySamples = c(3.244171, 2.719893, 
 3.122991, 3.616815, 3.519641, 3.244171, 8.719893, 5.122991, 7.616815, 
 5.519641)), .Names = c("Sequence", "Index", "xSamples", "ySamples"
 ), class = "data.frame", row.names = c(NA, -10L))

      

+6


source







All Articles