How to use coef () to output do () from dplyr
My question almost answers in dplyr 0.3.0.9000 how to use do () correctly , but not quite.
I have some data that looks like this:
> head(myData)
Sequence Index xSamples ySamples
6 0 5 0.3316187 3.244171
7 0 6 1.5131778 2.719893
8 0 7 1.9088933 3.122991
9 0 8 2.7940244 3.616815
10 0 9 3.6500311 3.519641
This sequence actually ranges from 0 to 9999. In each sequence, both xSamples and ySamples must be linear with respect to the Index. The plan is to group myData by sequence and then use it lm()
through do()
for each group. The code looks something like this (shamelessly removed from help):
library(dplyr)
myData_by_sequence <- group_by(myData, Sequence)
models <- myData_by_sequence %>% do(mod = lm(xSamples ~ Index, data = .))
It works, but I get the result ...
> head(models)
Source: local data frame [10000 x 2]
Sequence mod
1 0 <S3:lm>
2 1 <S3:lm>
3 2 <S3:lm>
4 3 <S3:lm>
5 4 <S3:lm>
6 5 <S3:lm>
., and the data I want is stuck in this second column. I have a working solution plyr
that looks like this ...
models <- dlply(myData, "Sequence", function(df) lm(xSamples ~ Index, data = df))
xresult <- ldply(models, coef)
. and it gives me results broken down into data structure thanks to coef()
. Trap: I can't mix dplyr (which I usually use and love) with plyr, and I can't get coef()
that second column from dplyr's output to work.
I've tried several other approaches like trying steps coef()
and lm()
and I can split the second column into a list of linear models, but I can't use the do()
on list.
It really seems to me that there is something obvious that I am missing here. R is definitely not my primary language. Any help would be appreciated.
edit Tried it.,
result <-
rects %>%
group_by(Sequence) %>%
do(data.frame(Coef = coef(lm(xSamples ~ Frame, data = .))))
., and get something very close, but with the coefficients stacked in one column:
Sequence Coef
1 0 -5.0189823
2 0 1.0004240
3 1 -4.9411745
4 1 0.9981858
source to share
Try
library(dplyr)
myData %>%
group_by(Sequence) %>%
do(data.frame(setNames(as.list(coef(lm(xSamples~Index, data=.))),
c('Intercept', 'Index')))
# Sequence Intercept Index
#1 0 -3.502821 0.7917671
#2 1 3.071611 0.3226020
Or using data.table
library(data.table)
setDT(myData)[, as.list(coef(lm(xSamples~Index))) , by = Sequence]
# Sequence (Intercept) Index
#1: 0 -3.502821 0.7917671
#2: 1 3.071611 0.3226020
data
myData <- structure(list(Sequence = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
1L, 1L), Index = c(5L, 6L, 7L, 8L, 9L, 15L, 6L, 9L, 6L, 10L),
xSamples = c(0.3316187,
1.5131778, 1.9088933, 2.7940244, 3.6500311, 7.3316187, 4.5131778,
9.9088933, 3.7940244, 4.6500311), ySamples = c(3.244171, 2.719893,
3.122991, 3.616815, 3.519641, 3.244171, 8.719893, 5.122991, 7.616815,
5.519641)), .Names = c("Sequence", "Index", "xSamples", "ySamples"
), class = "data.frame", row.names = c(NA, -10L))
source to share