How to import Excel data into sas

I have an Excel file (.xlsx) that has the column names on the 4th row and the data start on the 5th row. I'm not sure what to use to fetch data in Proc Import in SAS. Please help. Thanks to

+3


source to share


2 answers


I solved a similar problem in SAS 9.2 to import in two moves , one for exploring the worksheet and one for extracting data.

This is a generalization of what I did there, but excuse me for posting a source that I haven't tested: I don't have SAS installed on my PC. Let's assume your data might look like this (when saved as a tab-delimited file):

            Some title that does not interust us        
Author  Dirk Horsten                
Date    01-Jan-15               
Other   Irrelevant thing                

        Bar Foo     Val Remark
        A   Alfa    1   This is the first line
        B   Beta    2   This is the second line
        C   Gamma   3   This is the last line

      

So the actual data starts in cell C6 with the column heading "Bar". Suppose also that we know that we are finding columns "Foo", "Bar" and "Val" and perhaps some other columns that we are not interested in in an unknown order, and we do not know in advance how many rows of data there are.

Now we naively import list for the first time and ask for sasHelp, to see what it was read: ;

/** First stroke import, to explore the content of the sheet **/
proc import datafile="&file_name" out=temp_out dbms=excelcs replace;
    sheet="&sheet_name";
run; 

/** Find out what SAS read in **/
proc sql;
    select couint(*) into :nrColstempCos separ by ' '
    from sashelp.vcolumn where libName = 'WORK' and memName = 'TEMP_OUT';

    select name into :tempCos separated by ' '
    from sashelp.vcolumn where libName = 'WORK' and memName = 'TEMP_OUT';
quit;

      

Next, we look for headers and data, so we know how to read it correctly. ; This works if all columns have been interpreted as signed values, but unfortunately Excel cannot be forced to do so.



data _null_;
    set temp_out end=last;
    array temp {*} &tempCols.;

    retain foo_col bar_col val_col range_bottom 0; 
    if not (foo_col and bar_col and val_col) then do;
        range_left = 0;
        range_right = 0;

        /* Find out if we finally found the headers */
        do col = 1 to &nrCols.;
            select (upcase(temp(col));
                when ('FOO') do;
                    foo_col = col;
                    if not range_left then range_left = col;
                    rang_right = col;
                end;
                when ('BAR') do;
                    bar_col = col;
                    if not range_left then range_left = col;
                    rang_right = col;
                end;
                when ('VALUE') do;
                    val_col = col;
                    if not range_left then range_left = col;
                    rang_right = col;
                end;
                otherwise;
            end;
        end;
        if (foo_col and bar_col and val_col) then do;
            /** remember where the headers were found **/
            range_top = _N_ + 1;
            call symput ('rangeTop', range_top);

            rangeLeft = byte(rank('A') + range_left - 1);   
            call symput ('rangeLeft', rangeLeft);

            rangeRight = byte(rank('A') + range_right - 1); 
            call symput ('rangeRight', rangeRight);
        end;
    end; 
    else do;
        /** find out if there is data on this line **/
        if (temp(foo_col) ne '' and temp(bar_col) ne '' and temp(val_col) ne '') 
            then range_bottom = _N_ + 1;
    end;

    /** remember where the last data was found **/
    if last then call symput ('rangeBottom', range_bottom);
run;

      

To calculate rangeTop and rangeBottom, we take into account that the _N_th observation in SAS comes from the N + 1th row in excel, because the first row in excel is interpreted as headers.

To calculate rangeLeft and rangeRight, we have to find the relative position in the left column in the range we will be reading and convert it to letters

Now we only read the relevant data ;

/** Second stroke import, to read in the actual data **/
proc import datafile="&file_name" out=&out_ds dbms=excelcs replace;
    sheet="&heet_name";
    range="&rangeLeft.&rangeTop.&rangeRight.&rangeBottom.";
run; 

      

Success. Feel free to test this code if you have SAS on your machine and fix it.

+1


source


The following should work no matter how many lines precede your data, as long as the lines preceding your data are completely empty.

libname xl excel 'C:\somefile.xlsx';

data sheet;
    set xl.'Sheet1$'n;
run;

libname xl clear;

      



This sets up your Excel workbook as a database and the sheets link directly to tables. I should note that my setup is 64-bit SAS 9.4 with 64-bit Excel; I understand that this approach may not work as expected if, for example, you have 64-bit SAS and 32-bit Excel.

0


source







All Articles