Parallel loading odbc in Stata

I am trying to parallelize loading data over ODBC connection to Stata using a parallel command from SSC.

Here is a sample file myloop.do that iterates over two IDs:

#delimit;

foreach id in 77860668 1124831359 {;
    odbc load, exec(`"
    SELECT
        user_slctd_id
        ,user_id
        FROM dw_table
    WHERE user_id =`id'
    "') dsn("my_dsn") lowercase sqlshow clear;

    save "id_`id'.dta", replace;
};

      

It works well. However, when I try to run this code in parallel, follow these steps:

parallel setclusters 2
parallel do "myloop.do"

      

It hangs on "Waiting for clusters to complete".

How do I diagnose what's going on? Is there a better way to get data in parallel and add all files together?

I am using parallel module version 1.14.6.17 17jun2014.

+3


source to share


2 answers


If you want to know what's going on with the concurrent behind the scenes, you should use the keep option, like this:

parallel setclusters 2
parallel do "myloop.do", keep

      

This will mean that it should not delete auxiliary files (among them the log file).



Also, if you do this, you can always check out the latest version in the project repo http://github.com/gvegayon/parallel

Please let me know if this helps.

+2


source


Which ODBC driver are you using? What type of database?

This is almost like a situation where a driver can only handle one statement per connection. It's hard to tell without knowing what's going on at the driver level.



I would suggest creating an ODBC trace to see what the ODBC calls are doing while hanging. Check for any underlying errors that might be the root cause of the "wait" message.

Check your driver documentation for the specific surrounding number of connection and stream statements; in case this could be a factor.

+2


source







All Articles