# Cumulative Linear Regression

Sorry I'm new to R, but I have a dataframe with gamelogs for multiple players. I am trying to get the slope factor for each player point over all of their games. I've seen that I `aggregate`

can use operators like `sum`

and `average`

, and getting the coefficients from linear regression is pretty easy. How to combine them?

```
a <- c("player1","player1","player1","player2","player2","player2")
b <- c(1,2,3,4,5,6)
c <- c(15,12,13,4,15,9)
gamelogs <- data.frame(name=a, game=b, pts=c)
```

I want this to become:

```
name pts slope
player1 -.4286
player2 .08242
```

source to share

You can also do magic with base `lm`

to do everything at once:

```
coef(lm(game ~ pts*name - pts, data=gamelogs))[3:4]
coef(lm(game ~ pts:name + name, data=gamelogs))[3:4]
#pts:nameplayer1 pts:nameplayer2
# -0.42857143 0.08241758
```

How `data.frame`

:

```
data.frame(slope=coef(lm(game ~ pts*name - pts, data=gamelogs))[3:4])
# slope
#pts:nameplayer1 -0.42857143
#pts:nameplayer2 0.08241758
```

See here for further explanation of in-call modeling `lm`

:

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html

http://faculty.chicagobooth.edu/richard.hahn/teaching/FormulaNotation.pdf#2

In this case, it `pts*name`

expands to `pts + name + pts:name`

, which when removed `- pts`

means it's equivalent to`pts:name + name`

source to share

You could do

```
s <- split(gamelogs, gamelogs$name)
vapply(s, function(x) lm(game ~ pts, x)[[1]][2], 1)
# player1 player2
# -0.42857143 0.08241758
```

or

```
do.call(rbind, lapply(s, function(x) coef(lm(game ~ pts, x))[2]))
# pts
# player1 -0.42857143
# player2 0.08241758
```

Or, if you want to use `dplyr`

, you can do

```
library(dplyr)
models <- group_by(gamelogs, name) %>%
do(mod = lm(game ~ pts, data = .))
cbind(
name = models$name,
do(models, data.frame(slope = coef(.$mod)[2]))
)
# name slope
# 1 player1 -0.42857143
# 2 player2 0.08241758
```

source to share