SAS Proc SQL ever uses an index when merging

Question

SAS Proc SQL ever uses an index when merging

Consider the following (supposedly long) example.

The example code creates two datasets: one with "key" variables i, j, k and two data with key variables j, k and a "value" x variable. I would like to combine these two datasets as efficiently as possible. Both datasets are indexed relative to j and k: no index is needed for the first data, but it's there anyway.

Proc SQL does not use an index on the two data, which I assume would be the case if the data was in a relational database. Is this just a limitation of the query optimizer that I should accept?

The EDIT: . The answer to this question is yes, SAS can use an index to optimize the PROC SQL connection. In the following example, the relative sizes of the datasets are important: if you change the code so that the data two are relatively larger than the data, then the index will be used. Whether the datasets are sorted or not is irrelevant.

* Just to control the size of the data;
%let j_max=10000;

* Create data sets;
data one;
    do i=1 to 3;
        do j=1 to &j_max;
            do k=1 to 4;
                if ranuni(0)<0.9 then output;
            end;
        end;
    end;
run;

data two;
    do j=1 to &j_max;
        do k=1 to 4;
            x=ranuni(0);
            if ranuni(0)<0.9 then output;
        end;
    end;
run;

* Create indices;
proc datasets library=work nolist;
    modify one;
    index create idx_j_k=(j k);
    modify two;
    index create idx_j_k=(j k) / unique;
run;quit;

* Test the use of an index for the other data set:
* Log should display "INFO: Index idx_j_k selected for WHERE clause optimization.";
options msglevel=i;
data _null_;
    set two(where=(j<100));
run;

* Merge the data sets with proc sql - no index is used;
proc sql;
    create table onetwo as
    select
        one.*,
        two.x
    from one, two
    where
        one.j=two.j and
        one.k=two.k;
quit;

+2

performance sql indexing sas

Ville koskinen 12 oct. '09 at 8:25

source to share

1 answer

Chang Chung · Accepted Answer · 2009-10-12T14:08:24+0000

You can compare apples and oranges. For the join you are doing with proc sql

, the index may not help, because the observations are already ordered by j and k, and there are faster ways to "merge" than using indexes.

For a subset you step by step data _null_

, on the other hand, an index in j

will certainly help. If you do the same subset with proc sql

, you will see that it uses an index.

proc sql;
  select * from two where j < 100;
quit;
/* on log
INFO: Index idx_j_k selected for WHERE clause optimization.
*/

By the way, you can use the undocumented option _method

to check how proc sql

your request is performing. On my sas 9.2 on windows, it reports that it is making a so called "hash join":

proc sql _method;
  create table onetwo as
  select
    one.*,
    two.x
  from one, two
  where
    one.j=two.j and
    one.k=two.k;
quit;

/* on log
NOTE: SQL execution methods chosen are:

  sqxcrta
      sqxjhsh
          sqxsrc( WORK.ONE )
          sqxsrc( WORK.TWO )
*/

See Paul Kent's technical note for details .

SAS Proc SQL ever uses an index when merging

More articles: