SAS Proc SQL ever uses an index when merging

Consider the following (supposedly long) example.

The example code creates two datasets: one with "key" variables i, j, k and two data with key variables j, k and a "value" x variable. I would like to combine these two datasets as efficiently as possible. Both datasets are indexed relative to j and k: no index is needed for the first data, but it's there anyway.

Proc SQL does not use an index on the two data, which I assume would be the case if the data was in a relational database. Is this just a limitation of the query optimizer that I should accept?

The EDIT: . The answer to this question is yes, SAS can use an index to optimize the PROC SQL connection. In the following example, the relative sizes of the datasets are important: if you change the code so that the data two are relatively larger than the data, then the index will be used. Whether the datasets are sorted or not is irrelevant.

* Just to control the size of the data;
%let j_max=10000;

* Create data sets;
data one;
    do i=1 to 3;
        do j=1 to &j_max;
            do k=1 to 4;
                if ranuni(0)<0.9 then output;
            end;
        end;
    end;
run;

data two;
    do j=1 to &j_max;
        do k=1 to 4;
            x=ranuni(0);
            if ranuni(0)<0.9 then output;
        end;
    end;
run;

* Create indices;
proc datasets library=work nolist;
    modify one;
    index create idx_j_k=(j k);
    modify two;
    index create idx_j_k=(j k) / unique;
run;quit;

* Test the use of an index for the other data set:
* Log should display "INFO: Index idx_j_k selected for WHERE clause optimization.";
options msglevel=i;
data _null_;
    set two(where=(j<100));
run;

* Merge the data sets with proc sql - no index is used;
proc sql;
    create table onetwo as
    select
        one.*,
        two.x
    from one, two
    where
        one.j=two.j and
        one.k=two.k;
quit;

      

+2


source to share


1 answer


You can compare apples and oranges. For the join you are doing with proc sql

, the index may not help, because the observations are already ordered by j and k, and there are faster ways to "merge" than using indexes.

For a subset you step by step data _null_

, on the other hand, an index in j

will certainly help. If you do the same subset with proc sql

, you will see that it uses an index.

proc sql;
  select * from two where j < 100;
quit;
/* on log
INFO: Index idx_j_k selected for WHERE clause optimization.
*/

      



By the way, you can use the undocumented option _method

to check how proc sql

your request is performing. On my sas 9.2 on windows, it reports that it is making a so called "hash join":

proc sql _method;
  create table onetwo as
  select
    one.*,
    two.x
  from one, two
  where
    one.j=two.j and
    one.k=two.k;
quit;

/* on log
NOTE: SQL execution methods chosen are:

  sqxcrta
      sqxjhsh
          sqxsrc( WORK.ONE )
          sqxsrc( WORK.TWO )
*/

      

See Paul Kent's technical note for details .

+6


source







All Articles