How to use coef () to output do () from dplyr

Question

How to use coef () to output do () from dplyr

My question almost answers in dplyr 0.3.0.9000 how to use do () correctly , but not quite.

I have some data that looks like this:

> head(myData)
   Sequence Index  xSamples ySamples
6         0     5 0.3316187 3.244171
7         0     6 1.5131778 2.719893
8         0     7 1.9088933 3.122991
9         0     8 2.7940244 3.616815
10        0     9 3.6500311 3.519641

This sequence actually ranges from 0 to 9999. In each sequence, both xSamples and ySamples must be linear with respect to the Index. The plan is to group myData by sequence and then use it lm()

through do()

for each group. The code looks something like this (shamelessly removed from help):

library(dplyr)
myData_by_sequence <- group_by(myData, Sequence)
models <- myData_by_sequence %>% do(mod = lm(xSamples ~ Index, data = .))

It works, but I get the result ...

> head(models)
Source: local data frame [10000 x 2]

  Sequence     mod
1        0 <S3:lm>
2        1 <S3:lm>
3        2 <S3:lm>
4        3 <S3:lm>
5        4 <S3:lm>
6        5 <S3:lm>

., and the data I want is stuck in this second column. I have a working solution plyr

that looks like this ...

models <- dlply(myData, "Sequence", function(df) lm(xSamples ~ Index, data = df))
xresult <- ldply(models, coef)

. and it gives me results broken down into data structure thanks to coef()

. Trap: I can't mix dplyr (which I usually use and love) with plyr, and I can't get coef()

that second column from dplyr's output to work.

I've tried several other approaches like trying steps coef()

and lm()

and I can split the second column into a list of linear models, but I can't use the do()

on list.

It really seems to me that there is something obvious that I am missing here. R is definitely not my primary language. Any help would be appreciated.

edit Tried it.,

result <-
    rects %>% 
    group_by(Sequence) %>% 
    do(data.frame(Coef = coef(lm(xSamples ~ Frame, data = .))))

., and get something very close, but with the coefficients stacked in one column:

  Sequence       Coef
1        0 -5.0189823
2        0  1.0004240
3        1 -4.9411745
4        1  0.9981858

+3

r dplyr lm

timbo 21 jul. 15 at 14:41

source to share

1 answer

akrun · Accepted Answer · 2015-07-21T14:58:48+0000

Try

library(dplyr) 
myData %>%
      group_by(Sequence) %>%
      do(data.frame(setNames(as.list(coef(lm(xSamples~Index, data=.))),
                 c('Intercept', 'Index')))
#    Sequence Intercept     Index
#1        0 -3.502821 0.7917671
#2        1  3.071611 0.3226020

Or using data.table

 library(data.table)
 setDT(myData)[, as.list(coef(lm(xSamples~Index))) , by = Sequence]
 #   Sequence (Intercept)     Index
 #1:        0   -3.502821 0.7917671
 #2:        1    3.071611 0.3226020

data

 myData <- structure(list(Sequence = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
 1L, 1L), Index = c(5L, 6L, 7L, 8L, 9L, 15L, 6L, 9L, 6L, 10L),
 xSamples = c(0.3316187, 
 1.5131778, 1.9088933, 2.7940244, 3.6500311, 7.3316187, 4.5131778, 
 9.9088933, 3.7940244, 4.6500311), ySamples = c(3.244171, 2.719893, 
 3.122991, 3.616815, 3.519641, 3.244171, 8.719893, 5.122991, 7.616815, 
 5.519641)), .Names = c("Sequence", "Index", "xSamples", "ySamples"
 ), class = "data.frame", row.names = c(NA, -10L))

How to use coef () to output do () from dplyr

data

More articles: