Pandas 0.14.1 StataReader - reading .dta files
I am trying to import a large dataset from Stata 13 into pandas using StataReader. This worked fine with pandas 0.13.1, but after I updated to 0.14.1, the ability to read .dta files seemed to deteriorate dramatically. Does anyone know what happened (I couldn't find any changes to the StataReader in the What's New section of the pandas website) and / or how to get around this?
Steps to reproduce my problem:
-
Create a large dataset in Stata 13:
clear set obs 11500 forvalues i = 1/8000{ gen var`i' = 1 } saveold bigdataset, replace
-
Try to read it in pandas using StataReader:
from pandas.io.stata import StataReader reader = StataReader('bigdataset.dta') data = reader.data()
Using pandas 0.13.1 it will take about 220 seconds which is acceptable, but using pandas 0.14.1 nothing happened even after waiting about 20 minutes.
When I test this problem with a smaller dataset:
-
Create a smaller dataset in Stata 13:
clear set obs 11500 forvalues i = 1/1000{ gen var`i' = 1 } saveold smalldataset, replace
-
Try to read it in pandas using StataReader:
from pandas.io.stata import StataReader reader = StataReader('smalldataset.dta') data = reader.data()
Using pandas 0.13.1 it takes about 20 seconds, but using pandas 0.14.1 it takes about 300 seconds.
I would really like to upgrade to a newer version of pandas and work with my data, which is the size of bigdataset.dta. Does anyone know how I can import my data efficiently?
source to share