Split data frame by frame and create linear regression model for each subset
I have a data.frame of data from the World Bank that looks something like this:
country date BirthRate US.
4 Aruba 2011 10.584 25354.8
5 Aruba 2010 10.804 24289.1
6 Aruba 2009 11.060 24639.9
7 Aruba 2008 11.346 27549.3
8 Aruba 2007 11.653 25921.3
9 Aruba 2006 11.977 24015.4
In total, there are 70 country lookups in this dataframe that I would like to run linear regression into.
If I use the following, I get a nice lm for a specific country;
andora = subset(high.sub, country == "Andorra")
andora.lm = lm(BirthRate~US., data = andora)
anova(andora.lm)
summary(andora.lm)
But when I try to use the same type of code in a for I loop, I will print below code;
high.sub = subset(highInc, date > 1999 & date < 2012)
high.sub <- na.omit(high.sub)
highnames <- unique(high.sub$country)
for (i in highnames) {
linmod <- lm(BirthRate~US., data = high.sub, subset = (country == "[i]"))
}
Error message:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
If I can get this loop running, I would ideally like to add the coefficients and even improve the r-squared values ββfor each model in an empty data.frame file. Any help would be greatly appreciated.
thank
Josh
+1
source to share
2 answers
This is a slight modification of @ BondedDust's comment.
models <- sapply(unique(as.character(df$country)),
function(cntry)lm(BirthRate~US.,df,subset=(country==cntry)),
simplify=FALSE,USE.NAMES=TRUE)
# to summarize all the models
lapply(models,summary)
# to run anova on all the models
lapply(models,anova)
This creates a named list of models, so you can retrieve the model for Aruba as:
models[["Aruba"]]
+3
source to share