Reading stata files with large variables in stata and R

I am trying to read files stata

320MB in size stata

and R

with over 5000 variables. I used the first one stata

to read the file, but the maximum variable it can read is 5000. So I can't use stata to read the file stata

. My questions:

  • Is there a way to read the file stata

    using stata

    , first asking to store only the variables (I know variables) so that the number of variables is less than 5000?

  • Is there a way that I can read this stata

    file into R

    ?. I am using 32bit (Vista) and R

    it gives me an error. "Error: cannot allocate vector of size 21k.Kb"

    ...

I used the following R code to read the file:

  #The stata file is in the webpage: http://www.federalreserve.gov/econresdata/scf/scf_2010survey.htm#STATADAT
#1. set mem 400m
set maxvar 4000
use p10i6.dta, clear
keep x8166 x8167 x8168 x8163 x8164 x2422 x2506 x2606 x2623 x604 x614 x623 x716 x507  x513 x526 x1706 x1705 x1806 x1805 x1906 x1905 x2002 x2012 x1409 x1509 x1609 x1415 x1515 x1615 x1417 x1517 x1617 x1619 x1621 x3124 x3224 x3324 x3129 x3229 x3329 x3335 x3408 x3412 x3416 x3420 x3424 x3428 x4020 x4024 x4028 x4018 x4022 x4026 x4030 x4022 x4026 x4030 x4018 x3507 x3511 x3515 x3519 x3523 x3527 x3506 x3510 x3514 x3518 x3522 x3526 x3529 x3804 x3807 x3810 x3813 x3816 x3818 x3930 x3721 x3821 x3823 x3825 x3827 x3829 x3822 x3824 x3826 x3828 x3830 
save p10i6.dta, clear     


#2. 
        library (foreign)
        year<-2010
        yr <- substr( year , 3 , 4 )
        p10i6.dta<-read.dta(paste0( "p" , yr , "i6.dta" ))
        saveRDS(p10i6.dta,file=paste0( "p" , yr , "i6.rda" ))
        p10i6.rda<-readRDS(paste0( "p" , yr , "i6.rda" ))

      

+3


source to share


2 answers


I am assuming you have StataIC or an older version of Stata. The current Stata / SE and Stata / MP can read up to over 32,000 variables. So the first logical step would be to upgrade Stata to a version that can handle large datasets. If the problem does not occur due to not having available memory, that is ... The Stata error message would be helpful for that.

As Richard Herron said in the comments, you should read in a subset of the data using:

use X8166 X8167 ... using p10i6, clear

      

remember Stata is case sensitive. The variables are named X...

instead X...

, according to the website you are linking to.



If you want to load data into R using a package foreign

, make sure you select the maximum available memory for R on R. In Vista, which will be about 3.5Gb:

memory.limit(3500)

      

If that doesn't work, your dataset will be too large and you can use any of the ASCII methods from Stata or R to load the ASCII data provided on the website.

+3


source


To read data into R, there might be a way to do it with the memisc

package Stata.file

. Instead of reading in all variables, select the variables you need using a subset. For example:



require(memisc)

?Stata.file
d1 <- subset(
        Stata.file(paste0( "p" , yr , "i6.dta" )),
        select=c(x8166, x2606, x2623, x604)
    )

      

+4


source







All Articles