Best way to get required external data for R package
I have an R package that predicts gender groups from first names. This requires several fairly large datasets. I placed them in a separate R package . Ideally, a packet gender
could be packet dependent genderdata
and both would be accepted by CRAN. However, it looks like CRAN is not accepting the packet genderdata
because it is too large (26MB). (I think "big data"> = 5MB.)
So my question is this: what's the best way to get this data in my package gender
if I can't include the package genderdata
in Imports:
the file DESCRIPTION
.
My thought is to depend on devtools
and provide a function like this:
install_gender_data <- function() {
if(!require(genderdata)) devtools::install_github("lmullen/gender-data-pkg")
}
Then I would .onLoad()
also use the package launch message to inform users to launch this feature if they have not already loaded genderdata
.
source to share
Have a look at Hadley Wickham's "babynames" package. http://cran.r-project.org/web/packages/babynames/index.html
source to share