Split data frame by frame and create linear regression model for each subset

Question

Split data frame by frame and create linear regression model for each subset

I have a data.frame of data from the World Bank that looks something like this:

  country date BirthRate     US.
4   Aruba 2011    10.584 25354.8
5   Aruba 2010    10.804 24289.1
6   Aruba 2009    11.060 24639.9
7   Aruba 2008    11.346 27549.3
8   Aruba 2007    11.653 25921.3
9   Aruba 2006    11.977 24015.4

In total, there are 70 country lookups in this dataframe that I would like to run linear regression into.

If I use the following, I get a nice lm for a specific country;

andora = subset(high.sub, country == "Andorra")

andora.lm = lm(BirthRate~US., data = andora)

anova(andora.lm)
summary(andora.lm)

But when I try to use the same type of code in a for I loop, I will print below code;

high.sub = subset(highInc, date > 1999 & date < 2012)
high.sub <- na.omit(high.sub)
highnames <- unique(high.sub$country)

for (i in highnames) {
  linmod <- lm(BirthRate~US., data = high.sub, subset = (country == "[i]"))  
}

Error message:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

If I can get this loop running, I would ideally like to add the coefficients and even improve the r-squared values for each model in an empty data.frame file. Any help would be greatly appreciated.

thank

Josh

+1

r dataframe regression linear-regression linear

Josh 01 dec. 14 at 19:48

source to share

2 answers

Look at the lmList

package function nlme

:

library(nlme)
lmList(BirthRate ~ US. | country, df)

| country

Used here to create a regression for each individual country.

+2

Sven Hohenstein 03 dec. 14 at 14:53

source to share

jlhoward · Accepted Answer · 2014-12-01T20:13:34+0000

This is a slight modification of @ BondedDust's comment.

models <- sapply(unique(as.character(df$country)),
                 function(cntry)lm(BirthRate~US.,df,subset=(country==cntry)),
                 simplify=FALSE,USE.NAMES=TRUE)

# to summarize all the models
lapply(models,summary)
# to run anova on all the models
lapply(models,anova)

This creates a named list of models, so you can retrieve the model for Aruba as:

models[["Aruba"]]

Split data frame by frame and create linear regression model for each subset

More articles: